github
GitHub APIKeeps: repo, release, stars delta
- 2026-06-05ggerganov/llama.cpp b9536: b9536
<details open> opencl: improve get_rows, cpy, concat and q6_k flat gemv (#24160) * opencl: allow multiple workgroups for large rows * opencl: improve small cpy * opencl: packed concat for small input * opencl: tweak flat q6_K gemv, increase N_DST and remap threads </details
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9535: b9535
<details open> common/chat : unify and fix LFM2/LFM2.5 tool parser (#24178) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9535/llama-b9535-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enable
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9534: b9534
<details open> vulkan: add fwht support for Intel with shmem reduction (#23964) * vulkan: add fwht support for Intel with shmem reduction * don't use N as workgroup size * disable subgroup shuffle on MoltenVK AMD * disable fwht shader on Intel Windows due to driver bug </de
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9533: b9533
<details open> model: fix build failed (#24193) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9533/llama-b9533-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](https://github
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9531: b9531
<details open> TP: round up granularity to 128 (#24180) * TP: round up granularity to 128 * remove assert </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9531/llama-b9531-bin-macos-arm64.tar.gz) - macOS Apple
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9530: b9530
<details open> cli: fix model params not propagated (#23893) Fixes #23847 </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9530/llama-b9530-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9529: b9529
<details open> model : fix llama_model::n_gpu_layers() (#24188) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9529/llama-b9529-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9528: b9528
<details open> ui: run npm install when package-lock.json is newer than node_modules (#24171) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9528/llama-b9528-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm6
github:ggerganov/llama.cpp - 2026-06-05vllm-project/vllm v0.22.1: v0.22.1
## Highlights This release features 8 commits from 6 contributors (1 new)! v0.22.1 is a patch release on top of v0.22.0 with targeted bug fixes plus a couple of additions: new model support for JetBrains' Mellum v2, zentorch-accelerated quantized linear inference on AMD Zen
github:vllm-project/vllm - 2026-06-05ggerganov/llama.cpp b9524: b9524
<details open> minor : fix lint issues (#24165) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9524/llama-b9524-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](https://github
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9523: b9523
<details open> hparams : refactor `hparams.n_layer` (#24060) * hparams : refactor hparams.n_layer * cont : remove `n_layer_kv()`, use n_layer_all instead * cont : type consistency * pi : update SYSTEM.md * models : fix Step3.5 MTP * cont : remove duplicate switch cases *
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9522: b9522
<details open> kleidiai : dynamic chunck-based scheduling for hybrid execution (#23819) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9522/llama-b9522-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, Kle
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9521: b9521
<details open> CUDA: enroll mul_mat_vec_q_moe into pdl (#24087) * Enroll mul_mat_vec_q_moe into PDL, boosting MTP performance on BW Data collected on a B4500: Before ``` (llama.cpp) ➜ llama.cpp git:(master) ✗ python mtp-bench.py code_python pred= 192 draft= 150 acc=
github:ggerganov/llama.cpp - 2026-06-05ggerganov/llama.cpp b9519: b9519
<details open> sycl : port multi-column MMVQ from CUDA backend (#21845) mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K,
github:ggerganov/llama.cpp