Sparse Models / 2025-10
Sparse Models Fail Quietly Before They Fail Loudly
Sparse MoE systems can look healthy on loss curves while the router is already collapsing. Entropy and load metrics need to be first-class.
9 min
MoERouting EntropyScaling
Outline
- Expert utilization as a health signal.
- z-loss threshold behavior.
- Matched-FLOP benchmarking.
- What to log before scaling up.