github
GitHub APIKeeps: repo, release, stars delta
- 2026-06-30ggerganov/llama.cpp b9851: b9851
<details open> cuda : prevent integer truncation and overflow errors when using KQ mask strides in flash_attn_mask_to_KV_max kernel (#24945) Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/gg
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9850: b9850
<details open> model : register t_layer_inp for qwen3next (#25141) * Fix input assignment in layer processing loop Fix DFLASH for qwen-coder-next * add line break Added tensor for attention normalization in Qwen3 model. </details> **macOS/iOS:** - [macOS Apple Silicon (arm
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9849: b9849
<details open> common,server: handle bracketed IPv6 literals in URL authority (#25140) * common,server: handle bracketed IPv6 literals in URL authority Parse the [host]:port form (RFC 3986) and bracket IPv6 hosts when formatting a URL authority: listening log, proxy Host heade
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9848: b9848
<details open> CUDA: fix get_rows_back for tables with more than 65535 rows (grid-y clamp + stride) (#25103) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9848/llama-b9848-bin-macos-arm64.tar.gz) - macOS Appl
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9847: b9847
<details open> CUDA: fix Gemma E4B MTP FlashAttention (#25148) * CUDA: fix Gemma E4B MTP FlashAttention * remove unused template declaration </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9847/llama-b9847-bi
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9846: b9846
<details open> vulkan: roll bk loop in matmul for asahi linux (#24663) * vulkan: roll bk loop in matmul for asahi linux * vulkan: fix inline comment * vulkan: revert BK-loop unroll change * vulkan: edit spirv directly for asahi roll bk loop * vulkan: remove trailing whitesp
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9844: b9844
<details open> ggml-webgpu: add support for NVFP4 (#25143) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9844/llama-b9844-bin-macos-arm64.tar.gz) - macOS Apple Silicon (arm64, KleidiAI enabled) [DISABLED](htt
github:ggerganov/llama.cpp - 2026-06-30ggerganov/llama.cpp b9843: b9843
<details open> Revert "sched : reintroduce less synchronizations during split compute (#20793)" (#25138) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9843/llama-b9843-bin-macos-arm64.tar.gz) - macOS Apple Si
github:ggerganov/llama.cpp