Expert parallelism · EP
Distributing a Mixture-of-Experts model's experts across GPUs, routing tokens to whichever GPU holds the chosen expert.
Distributing a Mixture-of-Experts model's experts across GPUs, routing tokens to whichever GPU holds the chosen expert.