github
GitHub APIKeeps: repo, release, stars delta
- 2026-05-13ggerganov/llama.cpp b9139: b9139
<details open> flush the gpu profile timestamp before the queryset is overflowed (#22995) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9139/llama-b9139-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64,
github:ggerganov/llama.cpp - 2026-05-13ggerganov/llama.cpp b9134: b9134
<details open> download: do not exit() on error (#23008) </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9134/llama-b9134-bin-macos-arm64.tar.gz) - [macOS Apple Silicon (arm64, KleidiAI enabled)](https://github
github:ggerganov/llama.cpp # PyTorch 2.12.0 Release Notes - [Highlights](#highlights) - [Backwards Incompatible Changes](#backwards-incompatible-changes) - [Deprecations](#deprecations) - [New Features](#new-features) - [Improvements](#improvements) - [Bug fixes](#bug-fixes) - [Performance](#perfo
github:pytorch/pytorch- 2026-05-13ggerganov/llama.cpp b9133: b9133
<details open> server, webui: support continue generation on reasoning models (#22727) * server, webui : support continue generation on reasoning models (#22727) Remove the throw blocking assistant prefill on reasoning models and orchestrate thinking tags around the prefilled
github:ggerganov/llama.cpp - 2026-05-13ggerganov/llama.cpp b9131: b9131
<details open> spec : update CLI arguments for better consistency (#22964) * spec : update CLI arguments for better consistency * cont : fix CLI arg message </details> **macOS/iOS:** - [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b913
github:ggerganov/llama.cpp - 2026-05-13NVIDIA/cutlass v4.5.0: CUTLASS 4.5.0
## CuTe DSL * New features - New Block API `block_copy()` to simplify TMA and S2T copy. Users can ignore detail about multicast and 2CTA partition for TMA by `block_copy()` and need not to invoke `tma_partition()`. And users can remove bulk of S2T initialization to simplify S
NVDAgithub:NVIDIA/cutlass # Patch release v5.8.1 This release is mainly to fix the Deepseek V4 integration!!! <img width="714" height="774" alt="image" src="https://github.com/user-attachments/assets/0d85e891-a0ff-436e-a9d4-b6633096f2b5" /> * [fix] Add fatal_error to ContinuousBatchingManager s
github:huggingface/transformers- 2026-05-13ggerganov/llama.cpp b9128: b9128
<details open> hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993) * hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase * hmx-mm: optimize per-group scale handling * hmx-fa: optimize slope load from vtcm * hmx-fa: use aligned access w
github:ggerganov/llama.cpp