github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-21ggerganov/llama.cpp b9275: b9275
<details open> metal : optimize concat kernel and fix set kernel threads (#23411) * metal : fix GGML_OP_SET kernel threads * tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operation
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9274: b9274
<details open> server : free draft/MTP resources on sleep to fix VRAM leak (#23461) The destroy() function in server_context_impl only cleaned up the main model and context (via llama_init.reset()) but did not free the speculative decoder (spec), draft context (ctx_dft), or dra
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9273: b9273
<details open> server: re-inject subcommand when router spawns children under unified binary (#23442) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9273/llama-b9273-bin-macos-arm64.tar.gz) - [macOS Apple Sili
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9272: b9272
<details open> app : add batched-bench, fit-params, quantize & perplexity (#23459) * app : add batched-bench, fit-params, quantize & perplexity Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add missing main.cpp Signed-off-by: Adrien Gallouët <angt@huggingface.co> *
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9271: b9271
<details open> mtp: use inp_out_ids for skipping logit computation (#23433) when doing a follow-up decode for the draft model, we were always doing the logit computation even though it is not required. </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.c
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9270: b9270
<details open> vocab : add Carbon-3B (HybridDNATokenizer) support (#23410) * vocab : add Carbon-3B (HybridDNATokenizer) support Adds a new BPE pre-type LLAMA_VOCAB_PRE_TYPE_CARBON for the HybridDNATokenizer used by HuggingFaceBio/Carbon-{500M,3B,8B}. The base BPE is Qwen3-4B-B
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9267: b9267
<details open> ggml : Check the right iface method before using the fallback 2d get (#23306) Probably no backends implement only one of 2d get/set, but this might be annoying for some future backend developer trying to add 2d get/set. </details> **macOS/iOS:** - [macOS Apple
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9264: b9264
<details open> app : show version (#23426) Signed-off-by: Adrien Gallouët <angt@huggingface.co> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9264/llama-b9264-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9263: b9263
<details open> mtmd, model : merge HunyuanOCR into HunyuanVL and fix OCR vision precision (#23329) - HunyuanOCR shares the same HF arch and vision layout as HunyuanVL butwas split into a separate path that skipped the +0.1 bilinear sampler used by the HF reference. - Collapse O
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9260: b9260
<details open> opencl: refactor backend initilization (#23318) * opencl: refactor initialization * opencl: refactor GPU identification * opencl: rename for consistency * opencl: cache global mem size in dev_ctx * opencl: adjust log level * opencl: load argsort and flash_at
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9259: b9259
<details open> common/speculative : fix nullptr crash in get_devices_str (#23386) ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi </
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9258: b9258
<details open> mtmd : DeepSeek-OCR image processing fixes, img_tool::resize padding refactor (#23345) * mtmd : deepseek-ocr fixes, improvements and refactoring - image processing changes to achieve full parity with Pillow (reference impl) - SAM mask casting only when flash-att
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9257: b9257
<details open> vulkan: optimize operations in the IM2COL shader (#22685) * vulkan: optimize operations in the IM2COL shader * Add comments and improve the code formatting </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases
AAPLgithub:ggerganov/llama.cpp - 2026-05-21ggerganov/llama.cpp b9255: b9255
<details open> hexagon: HMX quantized matmul rework (#23368) * hmx-mm: update debug logging in hmx-mm * hmx-mm: update dequant logic to use HVX_vector_x2/4 * hmx-mm: remove non-pipelined version of the quantize matmul It seems that we don't reall need non-pipelined version
AAPLgithub:ggerganov/llama.cpp