semantic-scholar
Semantic Scholar1 events on 2026-05-01role: thematic30d historyaccess: keyless
Keeps: paper title, abstract snippet
Archive source — full history has value. Use pagination to browse older records.
Query: mixture of experts inference serving Authors: Ananya Hegde, Akshata Kumble, Ravi Gupta Citations: 0 Mixture-of-Experts (MoE) architectures reduce inference cost by activating only a sparse subset of parameters per token. However, when these models exceed single-GPU memory,