Continuous batching · In-flight batching
Dynamically adding and removing requests from a running inference batch to keep the GPU busy and lift throughput.
Dynamically adding and removing requests from a running inference batch to keep the GPU busy and lift throughput.