github
GitHub APIKeeps: repo, release, stars delta
- 2026-06-02ggerganov/llama.cpp b9484: b9484
<details open> opencl: use flat variants of q4_K and q6_K gemv for very large M (#24006) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9484/llama-b9484-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, Kl
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9483: b9483
<details open> hexagon: profiler output fix and script updates (#24042) * hex-ops: fix profiler output (ie remove the redundant NONEs) * hex-prof: update profiling script to support tot.usec column </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9482: b9482
<details open> model: add Mellum architecture (#23966) * model: support for Mellum architecture * model: improve mellum.py formatting * model: improve mellum.py formatting once again * deps: downgrade transformers to 4.57.6 (to fix CI) * deps: remove huggingface_hub depende
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9481: b9481
<details open> model : support granite multilingual embeddings R2 (ibm-granite/granite-embedding-{97,311}m-multilingual-r2) (#22716) * Add support for the ibm-granite/granite-embedding-{97m,311m}-multilingual-r2 embedding models: * Added a version of the gpt4o tokenizer that h
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9480: b9480
<details open> StepFun 3.5 MTP (#23274) * StepFun 3.5 MTP * Simplify to single layer * Rollback core changes * fix flake8 errors * Remove scripts * modify to convention * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> *
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9479: b9479
<details open> common : fix state save in common_prompt_batch_decode (#23468) * common : fix state save in common_prompt_batch_decode This commit addresses a bug in common_prompt_batch_decode that affects the session state store/restore in completion.cpp and save-load-state.cp
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9470: b9470
<details open> hexagon: MUL_MAT, MUL_MAT_ID, FLASH_ATTN and GDN cleanup and optimizations for latest models (#23989) * hex-mm: initial support for F32 * F32 -> F32 matmuls * hex-rms-norm: fix src1 stride use in fused rms_norm_mul * hex-ops: clear spad pointers in the ops tha
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9469: b9469
<details open> hexagon: add gelu_quick (#24007) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9469/llama-b9469-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](https://github
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9468: b9468
<details open> server: real-time reasoning interruption via control endpoint (#23971) * server: real-time reasoning interruption via control endpoint Builds on the manual reasoning budget trigger from #23949. Adds a CONTROL task that mirrors the CANCEL path on the live slot an
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9467: b9467
<details open> clean up unused variables warnings (#23975) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9467/llama-b9467-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](htt
github:ggerganov/llama.cpp - 2026-06-02ggerganov/llama.cpp b9466: b9466
<details open> opencl: fix compiler warnings for non-adreno path (#23922) * opencl: fix compiler warnings for non-adreno path * opencl: fix const cast warning </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9
github:ggerganov/llama.cpp