The Definitive Guide toAI Data Centers
Ask the Guide
GuideGlossaryPagedAttention

PagedAttention

A vLLM technique that manages the KV cache in fixed pages like virtual memory, cutting waste and fragmentation.

← All terms