Disaggregated prefill pipeline KV cache request-ID bug fix
Issue
Disaggregated prefill pipeline hang caused decode nodes to miss KV cache tensors.
Root Cause
Prefill and decode nodes used inconsistent request-ID formatting, so decode-side lookup could not locate the transferred KV cache state.
Patch
Implemented request-ID normalization at the prefill-decode boundary, refactored KV cache lookup semantics, and added targeted tests for matched and mismatched ID formats.
Technical Impact
Resolved indefinite hangs in distributed inference deployments and improved reliability for high-throughput disaggregated serving.
Engineering Complexity
High. The bug crossed request lifecycle, distributed KV cache transfer, and prefill/decode process boundaries.
Merged PR
PR #38816