semantic-scholar

Semantic Scholar

1 events on 2026-05-01role: thematic30d historyaccess: keyless

Keeps: paper title, abstract snippet

Archive source — full history has value. Use pagination to browse older records.

2026-05-01Research paper: Parallelism Strategies and Concurrency Effects for Mixture-of-Experts Inference on GPU Systems
Query: mixture of experts inference serving Authors: Ananya Hegde, Akshata Kumble, Ravi Gupta Citations: 0 Mixture-of-Experts (MoE) architectures reduce inference cost by activating only a sparse subset of parameters per token. However, when these models exceed single-GPU memory,