github
GitHub APIKeeps: repo, release, stars delta
- 2026-04-25ggerganov/llama.cpp b8933: b8933
<details open> chat: fix handling of space in reasoning markers (#22353) * chat: fix handling of space in reasoning markers * fix tests * whitespace </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8933/llama
github:ggerganov/llama.cpp - 2026-04-25ggerganov/llama.cpp b8931: b8931
<details open> CUDA: reduce MMQ stream-k overhead (#22298) * CUDA: reduce MMQ stream-k overhead * use 32 bit integers for kbc </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8931/llama-b8931-bin-macos-arm64.t
github:ggerganov/llama.cpp - 2026-04-25ggerganov/llama.cpp b8929: b8929
<details open> llama-quant : default ftype param `Q5_1` --> `Q8_0` (#20828) Change the default `ftype` in `llama_model_quantize_params` from `LLAMA_FTYPE_MOSTLY_Q5_1` to `LLAMA_FTYPE_MOSTLY_Q8_0`. In case some external program naively uses the default quantization params, we s
github:ggerganov/llama.cpp - 2026-04-25ggerganov/llama.cpp b8927: b8927
<details open> [SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291) * opt arc770 for Q4_0 * add for Q4_0 * update the script * add help script for windows * update guide * fix format issue * convert from dos to unix for format issue * fix missed -sm parameter <
github:ggerganov/llama.cpp - 2026-04-25ggerganov/llama.cpp b8926: b8926
<details open> ggml-webgpu: support for SSM_SCAN and disable set_rows error checking (#22327) * Implement ssm_scan * Remove blocking in graph_compute and check for set rows * Fix bindings * Update op support </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https:/
github:ggerganov/llama.cpp