semantic-scholar
Semantic Scholar1 events on 2026-05-20role: thematic30d historyaccess: keyless
Keeps: paper title, abstract snippet
Archive source — full history has value. Use pagination to browse older records.
Query: mixture of experts inference serving Authors: Can Hankendi, Rana Shahout, Minlan Yu, A. Coskun Citations: 1 Large language model (LLM) inference has become a dominant workload in modern data centers, driving significant GPU utilization and energy consumption. While prior s