Training that can continue at a reduced GPU count when nodes fail and absorb them back when restored.
← All terms