An open-source LLM serving framework with fast structured generation and aggressive KV-cache reuse.
← All terms