Mani Pal

Engineer-researcher

Mani Pal

LLM systems, CUDA kernels, inference optimization, compression, interpretability, and distributed AI infrastructure.

Writing

Technical essays with equations, code, and citations

These notes are written for readers who already understand modern ML systems. They emphasize mechanisms, tradeoffs, and reproducible engineering observations.

Transformer Internals12 min

Transformer Internals as a Systems Interface

The residual stream is the real systems interface of a transformer: training, inference, interpretability, and compression all negotiate with it.

TransformersResidual StreamInterpretability
Attention Mechanisms10 min

Attention Mechanisms Under IO Pressure

The useful mental model for modern attention kernels is not the softmax equation; it is the path data takes through HBM, SRAM, registers, and warps.

AttentionFlashAttentionCUDA
CUDA Optimization15 min

CUDA Optimization Notes from an Attention Kernel

CUDA optimization is the discipline of making memory motion, register pressure, and occupancy legible enough to trade them deliberately.

CUDANsightKernel Engineering
Mamba Architectures11 min

Mamba Architectures in Hybrid LLM Training

Hybrid SSM-attention models are best treated as architectural experiments whose evaluation must cover long-context behavior, tokenizer behavior, and deployment cost together.

MambaSSMLLM Training
Sparse Models9 min

Sparse Models Fail Quietly Before They Fail Loudly

Sparse MoE systems can look healthy on loss curves while the router is already collapsing. Entropy and load metrics need to be first-class.

MoERouting EntropyScaling
Inference Systems13 min

Inference Systems Are Acceptance-Rate Control Problems

Speculative decoding speedup is controlled by acceptance-rate dynamics, not merely by choosing a smaller draft model.

Speculative DecodingServingLatency
Mechanistic Interpretability14 min

Mechanistic Interpretability Needs Negative Results

Failed grokking runs are not noise; they can expose representation capacity boundaries when paired with the right spectral and causal diagnostics.

InterpretabilityGrokkingNegative Results