github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-29ggerganov/llama.cpp b9415: b9415
<details open> download: add option to skip_download (#23059) * download: add option to skip_download * fix * fix 2 * if file doesn't exist, respect skip_download flag </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/
github:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9414: b9414
<details open> mtmd: Add DeepSeekOCR 2 Support (#20975) * mtmd: DeepSeek-OCR 2 support, with multi-tile dynamic resolution * introduced clip_image_f32::add_viewsep * address PR review - drop redundant ggml_cpy ops in both deepseekocr versions build - drop no-op ggml_cont in
github:ggerganov/llama.cpp - 2026-05-29ROCm/ROCm rocm-7.2.4: ROCm 7.2.4 Release
# ROCm 7.2.4 release notes ROCm 7.2.4 is a quality release focused on performance and stability fixes for AI inference workloads on AMD Instinct GPUs. - [Release highlights](#release-highlights) - [ROCm binaries](#rocm-binaries) <a id="release-highlights"></a> ## Relea
AMDgithub:ROCm/ROCm - 2026-05-29ggerganov/llama.cpp b9413: b9413
<details open> CUDA: Check PTX version on host side to guard PDL dispatch (#23530) * CUDA: Check PTX version on host side to guard PDL dispatch Checking on `__CUDA_ARCH_LIST__` alone is insufficient for JIT, as this variable doesn't differentiate between compiling for say sm_9
github:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9412: b9412
<details open> server: bump timeout to 3600s (#23842) * server: bump timeout to 3600s * nits: change wording </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9412/llama-b9412-bin-macos-arm64.tar.gz) - macOS Ap
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9411: b9411
<details open> model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (#23346) * llama : support DeepSeek V3.2 model family (with DSA lightning indexer) * convert : handle DeepseekV32ForCausalLM architecture * ggml : support for
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9410: b9410
<details open> llama: use f16 mask for FA to save VRAM (#23764) * llama: use f16 mask for FA * review: add llama_cast + formatting * simplify </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9410/llama-b9410-
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9409: b9409
<details open> sync : ggml </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9409/llama-b9409-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](https://github.com/ggml-org/llama.c
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9406: b9406
<details open> llama: add llm_graph_input_mtp (#23643) * llama: add llm_graph_input_mtp * rename input_mtp -> input_token_embd * add TODO about mtmd embedding * cont : clean-up --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> </details> **macOS/iOS:** - [m
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9405: b9405
<details open> app : move licences to llama-app (#23824) Signed-off-by: Adrien Gallouët <angt@huggingface.co> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9405/llama-b9405-bin-macos-arm64.tar.gz) - macOS Ap
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9404: b9404
<details open> cuda : disables launch_fattn PDL enrollment due to compiler bug (#23825) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9404/llama-b9404-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, Kle
AAPLgithub:ggerganov/llama.cpp - 2026-05-29vllm-project/vllm v0.22.0: v0.22.0
## Highlights This release features 459 commits from 230 contributors (63 new)! * **DeepSeek V4 maturity**: DeepSeek V4 received a major hardening pass this cycle — the model was reorganized into a dedicated `vllm/models/deepseek_v4/` package (#43004, #43039, #43073, #43077
github:vllm-project/vllm - 2026-05-29ggerganov/llama.cpp b9403: b9403
<details open> meta : Add missing `buffer` set in allreduce fallback !COMPUTE clear (#23480) Without this at least the vulkan backend will skip the `* 0` for !COMPUTE tensors, causing corrupt output. </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9402: b9402
<details open> hexagon: basic/generic op fusion support and RMS_NORM+MUL fusion (#23835) Updating infra to enable op fusion and using RMS_NORM+MUL as the use-case. </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/downloa
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9401: b9401
<details open> mtmd-debug: add color and rainbow mode (#23829) * mtmd-debug: add color and rainbow mode * fix M_PI * max_dist </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9401/llama-b9401-bin-macos-arm64.
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9400: b9400
<details open> mtmd: fix gemma 4 projector pre_norm (#23822) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9400/llama-b9400-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](h
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9399: b9399
<details open> opencl: move backend info printing into its own function (#23702) * opencl: move backend info print into its own function * opencl: move new log line * opencl: fix for non adreno path </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.co
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9395: b9395
<details open> app : improve help output (#23805) Signed-off-by: Adrien Gallouët <angt@huggingface.co> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9395/llama-b9395-bin-macos-arm64.tar.gz) - macOS Apple Sil
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9394: b9394
<details open> mtmd: n_head_kv defaults to n_head (#23782) removed AI-generated comment </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9394/llama-b9394-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, Kl
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9393: b9393
<details open> mtmd: fix gemma 4 audio rms norm eps (#23815) * mtmd: fix gemma 4 audio rms norm eps * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> </detail
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9391: b9391
<details open> arg: Add LLAMA_ARG_API_KEY_FILE environment variable for --api-key-file (#23167) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9391/llama-b9391-bin-macos-arm64.tar.gz) - macOS Apple Silicon (ar
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9389: b9389
<details open> ggml: auto apply iGPU flag CUDA/HIP if integrated device (#23007) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9389/llama-b9389-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI e
AAPLgithub:ggerganov/llama.cpp - 2026-05-29ggerganov/llama.cpp b9388: b9388
<details open> mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for … (#23729) * mmvq Optim: add MMVQ_PARAMETERS_TURING(mmvq_parameter_table_id) for SM75 TURING * avoid a mismatch for JIT compilation of Turing device code for Ampere or newer Co-authored-by: J
AAPLgithub:ggerganov/llama.cpp