Writing

Technical essays with equations, code, and citations

These notes are written for readers who already understand modern ML systems. They emphasize mechanisms, tradeoffs, and reproducible engineering observations.

Transformer Internals12 min

Transformer Internals as a Systems Interface

The residual stream is the real systems interface of a transformer: training, inference, interpretability, and compression all negotiate with it.

TransformersResidual StreamInterpretability

Attention Mechanisms10 min

Attention Mechanisms Under IO Pressure

The useful mental model for modern attention kernels is not the softmax equation; it is the path data takes through HBM, SRAM, registers, and warps.

AttentionFlashAttentionCUDA

CUDA Optimization15 min

CUDA Optimization Notes from an Attention Kernel

CUDA optimization is the discipline of making memory motion, register pressure, and occupancy legible enough to trade them deliberately.

CUDANsightKernel Engineering

Mamba Architectures11 min

Mamba Architectures in Hybrid LLM Training

Hybrid SSM-attention models are best treated as architectural experiments whose evaluation must cover long-context behavior, tokenizer behavior, and deployment cost together.

MambaSSMLLM Training

Sparse Models9 min