Mani Pal

Engineer-researcher

Mani Pal

LLM systems, CUDA kernels, inference optimization, compression, interpretability, and distributed AI infrastructure.

Transformer Internals / 2026-01

Transformer Internals as a Systems Interface

The residual stream is the real systems interface of a transformer: training, inference, interpretability, and compression all negotiate with it.

12 min

TransformersResidual StreamInterpretability

Outline

  • Residual stream as shared memory.
  • Attention heads as sparse routing operations.
  • MLPs as feature-space write amplifiers.
  • Where compression perturbs the interface.

Equation

xl+1=xl+Attn(LN(xl))+MLP(LN(xl))x_{l+1}=x_l+Attn(LN(x_l))+MLP(LN(x_l))

References