Running prefill and decode on separate GPU pools so each scales independently for better efficiency.
← All terms