Engineer-researcher

Mani Pal

LLM systems, CUDA kernels, inference optimization, compression, interpretability, and distributed AI infrastructure.

Attention Mechanisms / 2026-01

Attention Mechanisms Under IO Pressure

The useful mental model for modern attention kernels is not the softmax equation; it is the path data takes through HBM, SRAM, registers, and warps.

10 min

AttentionFlashAttentionCUDA

softmax(QK^T)V