github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-09ggerganov/llama.cpp b9093: b9093
<details open> model : add sarvam_moe architecture support (#20275) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9093/llama-b9093-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](htt
github:ggerganov/llama.cpp - 2026-05-09ggerganov/llama.cpp b9090: b9090
<details open> cmake : update BoringSSL to 0.20260508.0 (#22839) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9090/llama-b9090-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https:
github:ggerganov/llama.cpp - 2026-05-09ggerganov/llama.cpp b9089: b9089
<details open> SYCL: reduce allocation overhead during flash attention (#22732) * SYCL: reduce allocation overhead during flash attention * tidy up whitespace * add a note about the flag * move ggml_sycl_fattn_* into fattn-buffers.hpp * refactor implementation into fattn-bu
github:ggerganov/llama.cpp - 2026-05-09ggerganov/llama.cpp b9088: b9088
<details open> [SYCL] Add BF16 support to GET_ROWS operation (#21391) Add GGML_TYPE_BF16 to the SYCL backend's GET_ROWS operation, both in supports_op and in the kernel dispatch. This fixes a performance regression where models using BF16 embedding tensors (e.g., Gemma4's per_l
github:ggerganov/llama.cpp - 2026-05-09ggerganov/llama.cpp b9087: b9087
<details open> sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path (#22152) * sycl: Q5_K reorder MMVQ/dequant + Q8_0 reorder MMVQ path Signed-off-by: Chun Tao <chun.tao@intel.com> * Remove duplicate definitions --------- Signed-off-by: Chun Tao <chun.tao@intel.com> Co-
github:ggerganov/llama.cpp - 2026-05-09ggerganov/llama.cpp b9085: b9085
<details open> Add flash attention MMA / Tiles to support MiMo-V2.5 (#22812) * mimo-v2.5: add flash attention mma/tiles for for d_kq=192 d_v=128 * mimo-v2.5: follow (256, 256) fattn templates * mimo-v2.5: cleanup comments * mimo-v2.5: further comment cleanup * mimo-v2.5: ad
github:ggerganov/llama.cpp - 2026-05-09ggerganov/llama.cpp b9084: b9084
<details open> hexagon: add HTP kernel for GGML_OP_GATED_DELTA_NET (#22837) Implement the Gated Delta Net recurrence on HVX with: - 4-row fused kernels for PP (prompt processing) path - 8-row fused kernels for TG (token generation) path, reducing K/Q/gate vector reload overhe
github:ggerganov/llama.cpp