github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-12ggerganov/llama.cpp b9127: b9127
<details open> opencl: add opt-in Adreno xmem F16xF32 GEMM for prefill (#22755) * ggml-opencl: add Adreno xmem F16xF32 GEMM for prefill * ggml-opencl: address Adreno xmem review comments * ggml-opencl: align xmem gemm kernel naming --------- Co-authored-by: Your Name <your@
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9124: b9124
<details open> mtmd, server, common: expose modalities to /v1/models (#22952) * mtmd, server, common: expose modalities to /v1/models * fix build * rename to mtmd_caps </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/d
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9123: b9123
<details open> ggml-webgpu: Enables running gpt-oss-20b (#22906) * Enable to run gpt-oss-20b and refactor mulmat-q * disable test-backend-ops in ubuntu-24-webgpu </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9122: b9122
<details open> ggml-webgpu: address precision issues for multimodal (#22808) * fix(mixed-types): use f32 for precision and update the shared memory calculation logic for f32 * fix(unary): correct the gelu, gelu quick and gelu erf functions * fix(flash-attn-tile): fix the har
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9119: b9119
<details open> vulkan: Fix Windows performance regression on Intel GPU BF16 workloads for Xe2 and newer (#22461) * refactor * Use l_warptile only when coopamt is available for BF16 </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cp
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9118: b9118
<details open> vulkan: Check shared memory size for mmq shaders (#22693) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9118/llama-b9118-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9116: b9116
<details open> mtmd: add MiMo v2.5 vision (#22883) * mimo-v2.5: vision support * mimo-v2.5: use fused qkv for vision * mimi-v2.5: fix f16 vision overflow * mimo-v2.5: comment cleanups * mimo-v2.5: Flash doesn't have mmproj more cleanup remember to use filter_tensors * mimo
github:ggerganov/llama.cpp - 2026-05-12ggerganov/llama.cpp b9114: b9114
<details open> metal : promote mul_mv/mul_mm batch divisors to function constants (#22711) * metal : promote mul_mv/mul_mm batch divisors to function constants * metal : take op directly in get_pipeline_mul_mv_ext </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](htt
github:ggerganov/llama.cpp