KV cache · Key-Value cache
Stored attention tensors reused during decode; its size grows with context and concurrency, dominating inference memory.
Stored attention tensors reused during decode; its size grows with context and concurrency, dominating inference memory.