Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo

Coding agents hit 85-97% KV cache on repeat calls; 4-agent swarms reach 97.2%, an 11.7x read/write ratio. NVIDIA Dynamo adds agent hints API (priority, osl, speculative prefill), prefix-aware routing, and TTL-pinned KV blocks for SGLang/vLLM/TRT-LLM. Open-source.