github

GitHub API

8 events on 2026-05-11role: entity7d historyaccess: keyless

Keeps: repo, release, stars delta

← prev day2026-05-11next day →all history

2026-05-11ggerganov/llama.cpp b9113: b9113
<details open> opencl: add q4_1 MoE for Adreno (#22856) * Q4_1 MoE CLC pass sanity check * remove unnecessary code * opencl: remove unnecessary asserts and reformat * opencl: fix supports_op for q4_1 moe * q4_1 moe is supported by Adreno with certain shapes --------- Co-a
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9112: b9112
<details open> CUDA: handle OW > 65535 in im2col (2D and 3D) (#22944) `im2col_cuda` and `im2col_3d_cuda` both dispatch with `block_nums.y = OW`. CUDA caps grid Y at 65535. Conv1d encoders on raw 16 kHz audio with T > 65535 (~ 4 s) trip the limit -- e.g. SEANet at 11 s lands at
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9110: b9110
<details open> docs: fix metrics endpoint description in server README (#22879) * docs: fix metrics endpoint description in server README Required model query parameter for router mode described. Removed metrics: - llamacpp:kv_cache_usage_ratio - llamacpp:kv_cache_tokens Add
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9109: b9109
<details open> spec : parallel drafting support (#22838) * spec : refactor * spec : drop support for incompatible vocabs * spec : update common_speculative_init() * cont : pass seq_id * cont : dedup ctx_seq_rm_type * server : sketch the ctx_dft decode loop * server : draf
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9106: b9106
<details open> vulkan: Support asymmetric FA in scalar/mmq/coopmat1 paths (#22589) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9106/llama-b9106-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiA
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9105: b9105
<details open> CUDA: directly include cuda/iterator (#22936) Before, we relied on a transient import from `cub/cub.cuh`, which is bad practice to do as cub may not always expose cuda/iterator </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-or
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9103: b9103
<details open> vendor : update cpp-httplib to 0.44.0 (#22919) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9103/llama-b9103-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https://g
github:ggerganov/llama.cpp
2026-05-11ggerganov/llama.cpp b9102: b9102
<details open> [SYCL] Add OP im2col_3d (#22903) * add im2col_3d * format code * update the ops.md </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9102/llama-b9102-bin-macos-arm64.tar.gz) - [macOS Apple Silic
github:ggerganov/llama.cpp