github
GitHub APIKeeps: repo, release, stars delta
- 2026-04-29ggerganov/llama.cpp b8981: b8981
<details open> common : do not pass prompt tokens to reasoning budget sampler (#22488) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8981/llama-b8981-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, Kle
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8980: b8980
<details open> hexagon: make vmem and buffer-size configurable (#22487) * hexagon: allow host to set max vmem size We use a sane default but it's helpful to allow for an override if needed. * hexagon: add support for measuring vmem space and move pinned mmaping management to
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8979: b8979
<details open> CUDA: fuse SSM_CONV + ADD(bias) + SILU (#22478) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8979/llama-b8979-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https://
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8978: b8978
<details open> spec : disacard last drafted token with low prob (#22506) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8978/llama-b8978-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8977: b8977
<details open> sync : ggml </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8977/llama-b8977-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https://github.com/ggml-org/llama.cpp/releas
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8974: b8974
<details open> ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (#22293) * ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault vec_xst operations in the tiled path crash on AIX when writing near 4KB page boundaries due to strict memory prot
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8973: b8973
<details open> ggml-cuda: refactor fusion code (#22468) * ggml-cuda: refactor fusion code * apply formatting + make env variable truthy </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8973/llama-b8973-bin-mac
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8972: b8972
<details open> ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (#22317) * ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME When GGML_CPU_RISCV64_SPACEMIT=ON is set, ime1_kernels.cpp contains inline asm for the vmadot family which requires the xsmtvdotii cust
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8971: b8971
<details open> ggml-webgpu: Fix bug in FlashAttention support check (#22492) * Fix flashattention support check for devices that don't support subgroups * set path to none if kv_tile doesn't fit </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggm
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8970: b8970
<details open> common: Intentionally leak logger instance to fix hanging on Windows (#22273) * Changed to leak logger singleton to prevent hanging on Windows * Fix comment * Stopped using static vector Using std::vector will cause g_col to be released before the logger thre
github:ggerganov/llama.cpp - 2026-04-29ggerganov/llama.cpp b8967: b8967
<details open> ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b8967/llama-b8967-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiA
github:ggerganov/llama.cpp