github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-27ggerganov/llama.cpp b9371: b9371
<details open> ggml-webgpu: remove legacy constants (#23672) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9371/llama-b9371-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](h
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9370: b9370
<details open> hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647) * hex-mm: add support for Q4_1 matmul/matvec, hvx-only for now * hmx-mm: add support for Q4_1 * hex-mm: use Q8_1 dynamic quantization to avoid having to compute sums in the vec_dot * hexagon: fix
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9368: b9368
<details open> vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (#22887) * vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 Against mesa git, this shows a 4.8% performance improvement for tg128 on Qwen3.5-9B:BF16 on Intel BMG. Note that this breaks some te
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9369: b9369
<details open> ggml-webgpu: Fix how to dispatch WG to some ops (#23750) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9369/llama-b9369-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9367: b9367
<details open> vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul (#23541) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9367/llama-b9367-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9366: b9366
<details open> vulkan: add REPEAT op support for f16 to f16. (#23298) * feat: extend repeat op for vulkan * feat: add repeat_f16 vulkan pipeline * fix: ensure same dst and src types * fix: use type_size instead of data types * fix: use int16 and int32 for repeat shader op
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9365: b9365
<details open> ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780) * ci : move ARM jobs to 3rd-party runners + disable kleidiai release * cont : fix deps + fix names * ocd : fix names * cont : fix PR links </details> **macOS/iOS:** - [macOS Apple Sili
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9360: b9360
<details open> common : fix env names to all have LLAMA_ARG_ prefix (#23778) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9360/llama-b9360-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enab
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9357: b9357
<details open> vulkan: avoid preferring transfer queue on AMD UMA devices (#22455) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9357/llama-b9357-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiA
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9354: b9354
<details open> convert: add MiniCPM5 tokenizer support (#23384) Add minicpm5 pre-tokenizer hash via convert_hf_to_gguf_update.py and implement hardcoded regex handling in llama-vocab.cpp, consistent with other BPE pre-tokenizers. Co-authored-by: zhangtao <zhangtao2@modelbest.c
AAPLgithub:ggerganov/llama.cpp - 2026-05-27ggerganov/llama.cpp b9353: b9353
<details open> server : fix the log message when using SSL (#23393) When llama-server is started with SSL key and cert, the log says that it listens on http instead of https. This patch fixes this. </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/g
AAPLgithub:ggerganov/llama.cpp - 2026-05-27NVIDIA/cutlass v4.5.1: CUTLASS 4.5.1
## CuTe DSL * Bug fixing and improvements - Fixed following issues: https://github.com/NVIDIA/cutlass/issues/3219 https://github.com/NVIDIA/cutlass/issues/3218 https://github.com/NVIDIA/cutlass/issues/3212 https://github.com/NVIDIA/cutlass/issues/3210
NVDAgithub:NVIDIA/cutlass