Mechanistic Interpretability / 2026-03
Mechanistic Interpretability Needs Negative Results
Failed grokking runs are not noise; they can expose representation capacity boundaries when paired with the right spectral and causal diagnostics.
14 min
InterpretabilityGrokkingNegative Results
Outline
- Memorization versus circuit formation.
- Why non-abelian failures are informative.
- CKA and Peter-Weyl diagnostics.
- Next step: causal patching.