CUDA Optimization / 2026-02
CUDA Optimization Notes from an Attention Kernel
CUDA optimization is the discipline of making memory motion, register pressure, and occupancy legible enough to trade them deliberately.
15 min
CUDANsightKernel Engineering
Outline
- Tile shape selection.
- Register accumulation and spilling.
- Shared memory pressure.
- Nsight metrics that changed implementation choices.
Code Block
for (int block = 0; block < n_blocks; ++block) {
load_kv_tile(block);
update_online_softmax();
accumulate_output();
}