github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-08ggerganov/llama.cpp b9082: b9082
<details open> Feature hexagon l2 norm (#22816) * L2_NORM Updates * Addressed PR Comments * ggml-hexagon: add L2_NORM HVX kernel for Hexagon backend * hex-unary: remove supported_unary_nc since the outer loop is the same for all unary ops --------- Co-authored-by: Max Kras
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9081: b9081
<details open> common : do not wrap raw strings in schema parser for tagged parsers (#22827) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9081/llama-b9081-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm6
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9080: b9080
<details open> model : support Gemma4_26B_A4B_NVFP4 (#22804) * Gemma4_26B_A4B_NvFp4 hf checkpoint convert to gguf format fixes Signed-off-by: ynankani <ynankani@nvidia.com> * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> *
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9079: b9079
<details open> common : revert reasoning budget +inf logit bias (#22740) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9079/llama-b9079-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9077: b9077
<details open> server: support Vertex AI compatible API (#22545) * server: support Vertex AI compatible API * a bit safer * support other AIP_* env var * various fixes * if AIP_MODE is unset, do nothing * fix test case * fix windows build </details> **macOS/iOS:** - [ma
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9076: b9076
<details open> server: (router) expose child model info from router's /v1/models (#22683) * server: (router) expose child model info from router's /v1/models * update docs </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/release
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9075: b9075
<details open> cuda: fuse snake activation (mul, sin, sqr, mul, add) (#22667) * cuda: fuse snake activation (mul, sin, sqr, mul, add) Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (Bi
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9073: b9073
<details open> CUDA: lower-case PCI bus id, standardize for ggml (#22820) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9073/llama-b9073-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled
github:ggerganov/llama.cpp - 2026-05-08ggerganov/llama.cpp b9070: b9070
<details open> opencl: add q4_0 MoE GEMM for Adreno (#22731) * Q4_0 MoE CLC pass sanity check * release program * opencl: fix whitespace * opencl: remove unused cl_program * opencl: break #if block to make it more clear * opencl: adjust format --------- Co-authored-by: L
github:ggerganov/llama.cpp