Mani Pal

Engineer-researcher

Mani Pal

LLM systems, CUDA kernels, inference optimization, compression, interpretability, and distributed AI infrastructure.

CUDA Optimization / 2026-02

CUDA Optimization Notes from an Attention Kernel

CUDA optimization is the discipline of making memory motion, register pressure, and occupancy legible enough to trade them deliberately.

15 min

CUDANsightKernel Engineering

Outline

  • Tile shape selection.
  • Register accumulation and spilling.
  • Shared memory pressure.
  • Nsight metrics that changed implementation choices.

Code Block

for (int block = 0; block < n_blocks; ++block) {
  load_kv_tile(block);
  update_online_softmax();
  accumulate_output();
}

References