github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-02ggerganov/llama.cpp b9010: b9010
<details open> fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus entirely) (#22533) * fix: CUDA device PCI bus ID detection for multi-GPU de-dupe * HIP, MUSA macros --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> </details> **macOS/iOS:** - [ma
github:ggerganov/llama.cpp - 2026-05-02ggerganov/llama.cpp b9009: b9009
<details open> server : avoid checkpoint data host copies (#22558) * server : avoid checkpoint data host copies * llama : refactor llama_io_read_i </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9009/llama-b9
github:ggerganov/llama.cpp - 2026-05-02ggerganov/llama.cpp b9008: b9008
<details open> ggml-virtgpu: fix circular dependency in headers (#22557) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9008/llama-b9008-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)
github:ggerganov/llama.cpp - 2026-05-02ggerganov/llama.cpp b9006: b9006
<details open> opencl: Adreno optimization for MoE - MxFP4 (#22301) * MoE Mxfp4 CLC kernel added, router reorder on GPU * Pass test-backend-ops for MoE mxfp4 Adreno CLC * remove putenv in llama-model.cpp * fix indent style and whitespace * opencl: remove unnecessary headers
github:ggerganov/llama.cpp - 2026-05-02ggerganov/llama.cpp b9004: b9004
<details open> sync : ggml </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9004/llama-b9004-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https://github.com/ggml-org/llama.cpp/releas
github:ggerganov/llama.cpp - 2026-05-02ggerganov/llama.cpp b9002: b9002
<details open> sync : ggml </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9002/llama-b9002-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https://github.com/ggml-org/llama.cpp/releas
github:ggerganov/llama.cpp - 2026-05-02ggerganov/llama.cpp b9000: b9000
<details open> hexagon: hmx flash attention (#22347) * hmx: extract shared interleave headers and unify matmul batched * hmx: add HMX-accelerated flash attention for prefill * hmx: replace asm wrappers with Q6_ intrinsics in hmx-utils.h Switches three single-instruction help
github:ggerganov/llama.cpp