Optimizer state
The extra per-parameter data an optimizer like Adam keeps (momentum, variance), often doubling or tripling memory needs.
The extra per-parameter data an optimizer like Adam keeps (momentum, variance), often doubling or tripling memory needs.