github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-22ggerganov/llama.cpp b9291: b9291
<details open> SYCL: improve MoE prefill throughput (#23142) - change `k_copy_src1_to_contiguous` so that uses a precomputed contiguous mapping where all rows "owned" by an expert are in one slice with a know starts and ends - switch the `O(n_as * n_routed_rows)` contraption to
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9292: b9292
<details open> perplexity : fix integer overflow (#23496) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9292/llama-b9292-bin-macos-arm64.tar.gz) - [mac
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9290: b9290
<details open> sycl : Level Zero detection in ggml_sycl_init (#23097) * [SYCL] Centralize Level Zero detection in ggml_sycl_init * use the same wording * get back the warning </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/rel
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9289: b9289
<details open> SYCL : gated_delta_net K>1 (#23174) * sycl_gated_delta_net K>1 * editor_config </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9289/llama-b9289-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (a
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9286: b9286
<details open> ggml-zendnn : add Q8_0 quantization support (#23414) * ggml-zendnn : add Q8_0 quantization support * ggml-zendnn : sync with latest ZenDNN * ggml-zendnn : address review comments for Q8_0 </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://githu
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9285: b9285
<details open> cmake : build router app only during standalone builds (#23521) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9285/llama-b9285-bin-macos
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9284: b9284
<details open> vocab : fix HybridDNA tokenizer (#23466) * vocab : mark hybriddna k-mers to avoid BPE token collisions * improved loop --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9283: b9283
<details open> cmake : add install() for impl libraries + fix apple builds (#23511) * pi : update * ci : fix ios build * ci : fix andoroid * ci : fix apple builds * cmake : add install() for impl libraries Add install(TARGETS <target> LIBRARY) for all -impl libraries that
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9279: b9279
<details open> vulkan: fuse snake activation (mul, sin, sqr, mul, add) (#22855) * vulkan: fuse snake activation (mul, sin, sqr, mul, add) Add snake.comp shader with F32 / F16 / BF16 pipelines and ggml_vk_snake_dispatch_fused. The matcher recognizes the naive 5 op decomposition
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9277: b9277
<details open> tests : move save-load-state from examples to tests (#23336) * tests : move save-load-state from examples to tests - Move examples/save-load-state/ to tests/test-save-load-state.cpp - Remove subdirectory reference from examples/CMakeLists.txt - Add test to tests
AAPLgithub:ggerganov/llama.cpp - 2026-05-22ggerganov/llama.cpp b9276: b9276
<details open> server: expose prompt token counts in /slots endpoint (#23454) Add n_prompt_tokens, n_prompt_tokens_processed, and n_prompt_tokens_cache to the /slots JSON response. These fields are already tracked internally but were not exposed, making it impossible for client
AAPLgithub:ggerganov/llama.cpp