github
GitHub APIKeeps: repo, release, stars delta
# Patch release v5.10.2 There was a big bug in the model conversion of models related to clip, this affected models like sam3 and others. Please make sure to update :pray: * Fix conversion for clip models by @zucchini-nlp (#46406) **Full Changelog**: https://github.com/
github:huggingface/transformers- 2026-06-04ggerganov/llama.cpp b9518: b9518
<details open> server : disable on-device spec checkpoints (#24108) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9518/llama-b9518-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISA
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9515: b9515
<details open> Move duplicated imatrix code into single common imatrix-loader.cpp (#22445) * Deduplicate imatrix loading code * Add back LLAMA_TRACE, early exit on quantize missing metadata </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9512: b9512
<details open> return filter to save memory (#24125) Co-authored-by: lvyichen <lvyichen@stepfun.com> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9512/llama-b9512-bin-macos-arm64.tar.gz) - macOS Apple Silic
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9510: b9510
<details open> ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (#22209) * ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using WASM SIMD128 intrinsics, gated behind #ifdef __wasm_simd128__ so non-wasm
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9509: b9509
<details open> server: avoid unnecessary checkpoint restore when new tokens are present (#24110) * server: avoid unnecessary checkpoint restore when new tokens are present The pos_min_thold calculation unconditionally subtracts 1 to ensure at least one token is evaluated for l
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9500: b9500
<details open> metal : reduce rset heartbeat from 500ms -> 5ms (#24074) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9500/llama-b9500-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9499: b9499
<details open> ggml-webgpu: FlashAttention refactor + standardize quantization support (#23834) * Start work on flash_attn refactor * Refactor * Split k/v quantization * Refactor and abstract quantization logic for flash_attn and mul_mat * Add quantization support to tile p
github:ggerganov/llama.cpp - 2026-06-04ggerganov/llama.cpp b9498: b9498
<details open> ggml-cpu: extend RVV quantization vec dot to higher VLENs (#22754) * ggml-cpu: add rvv 512b,1024b impls for iq4_xs * ggml-cpu: refactor; add rvv 512b, 1024b impls for q6_K, i-quants * ggml-cpu: refactor; add 512 and 1024 implementations of tq3_s, iq3_xxs, iq2_s
github:ggerganov/llama.cpp