github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-20ggerganov/llama.cpp b9254: b9254
<details open> Programmatic Dependent Launch (PDL) for more performance on newer NVIDIA GPUs (Hopper+) (#22522) * Adds initial PDL setup. * Adds PDL barriers based on simple heuristic: place "sync" before first input pointer access, and "launch" after last write, e.g. to tenso
AAPLgithub:ggerganov/llama.cpp - 2026-05-20ggerganov/llama.cpp b9253: b9253
<details open> app : introduce the llama unified executable (#23296) * app : introduce the llama unified executable Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use serve for server Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Hide completion and bench,
AAPLgithub:ggerganov/llama.cpp - 2026-05-20ggerganov/llama.cpp b9251: b9251
<details open> mtmd: fit_params now take into account mmproj (#21489) * mtmd: fit_params now take into account mmproj * rename alloc_compute_meta to reserve_compute_meta * rm unused functions * add ggml_backend_dev_t support * add debug log </details> **macOS/iOS:** - [ma
AAPLgithub:ggerganov/llama.cpp # Release v5.9.0 ## New Model additions ### Cohere2Moe Command A+ is a Mixture-of-Experts (MoE) language model from Cohere that features a hybrid attention pattern combining sliding window and full attention layers. The model incorporates both shared and routed experts
COHEREgithub:huggingface/transformers