Guide › Electrical & Energy Infrastructure › 4.5

Chapter 4.5

UPS & Energy Storage: From Ride-Through to Transient Absorption

The UPS stopped being an outage-bridge and became a transient shock-absorber the day a rack learned to swing from idle to 150 kW and back in milliseconds — so the question is no longer "how many minutes of runtime" but "how fast, how flat, and at which layer of the chip→BBU→BESS spine you kill the spike."

POWER-BOUNDGOODPUT

What you'll decide here

Whether your backup architecture is sized for the legacy problem (ride through a utility blip until the generator catches) or the AI problem (absorb a synchronized, phase-coherent GPU load step before it reaches the transformer, the generator, or the grid) — the two demand different chemistries, different placements, and different response times.
Central double-conversion UPS vs distributed rack-level BBU (OCP ORV3) vs a block-redundant/"catcher" topology — and therefore where your failure domain lives, what your stranded-capacity penalty is, and whether you can ever justify eco-mode.
Where energy storage sits along the chip→BBU→BESS mitigation spine, how much joule-per-GPU you provision on-package versus at the rack versus at the facility, and who owns the smoothing — because the layer you skip is the layer that ships the transient downstream onto something that can't absorb it.
Battery chemistry and runtime basis: VRLA vs LFP vs supercapacitor vs hybrid, sized to the real sizing basis (EDPp ≈ 1.5× TDP, not nameplate TDP), with the cabinet-count and footprint consequence that follows.
How the transient-absorption claim is metered, acceptance-tested, and contracted — with the utility (ramp-rate limits, flicker), with tenants (provisioning ratio), and in commissioning (load-bank and step-load acceptance) — because an unproven smoothing claim is a stranded interconnection slot waiting to happen.

A relay of energy buffers catches the synchronized GPU load step before it reaches the grid — miss a handoff and it hits the substation.

For thirty years the static UPS had exactly one job, and the whole discipline was organized around it: stand between the utility and the IT load, ride through the sub-cycle disturbances and the few seconds it takes a standby generator to start and accept load, and otherwise stay invisible. Sizing was a runtime question — five minutes, ten minutes, fifteen — and topology was a reliability question — N+1 or 2N. The battery was an insurance policy you hoped never to cash. That mental model is now obsolete inside an AI hall, and the chapters that still teach it are teaching the wrong building.

The reason is the load itself. A rack of GPUs running a synchronous collective is not a steady, benign, power-factor-corrected load. It is a 100% non-linear, phase-coherent machine whose thousands of accelerators step in lockstep — idle to full and back — on the cadence of the training step. A GB200 NVL72 swings on the order of 100+ kW per rack; the same synchronized swing, multiplied across a hall and a campus, becomes a multi-hundred-megawatt step that the transformer, the generator, and ultimately the grid all have to absorb. The 2025–26 NERC Level 3 alert — issued after repeated events where ~1,500 MW of data-center load dropped on a single transmission fault — is the macro symptom of this micro behavior. The UPS, and the energy-storage stack it has grown into, is now the primary tool for shaping that load, not merely surviving its loss.

This chapter is the canonical home for the power-transient problem and its mitigation. We trace the topology forks (double-conversion vs eco-mode; central vs distributed vs catcher), the chemistry shift (VRLA→LFP, plus supercapacitors for the sub-millisecond regime), and then the spine that ties it all together: on-package capacitance → rack BBU → facility BESS, each layer catching a different timescale of the same spike. The on-die origin of the transient is engineered in Chapter 7.12; the cooling-side twin — the loss-of-flow transient that thermal-trips a 1 kW GPU in seconds — lives in Chapter 5.12; the grid-facing obligation it creates is in Chapter 4.10. Here we own the storage, the metering, and the acceptance.

The reframe: ride-through is now a sub-problem of transient absorption

Hold two timescales in mind, because they are what separate the legacy problem from the AI one. Ride-through is a rare, large, slow event: the utility drops, and storage must carry the full load for seconds-to-minutes until the generator picks up. Transient absorption is a continuous, smaller, fast event: the GPUs themselves create a load step every few seconds, and storage must inject or absorb power in milliseconds to keep the spike from propagating upstream. The legacy central UPS is built for the first and is structurally bad at the second — it sits too far from the rack, behind too much impedance, with a control loop too slow to chase a millisecond edge.

The consequence of missing this reframe is concrete. If you size only for ride-through, your storage stack does nothing about the synchronized swing — so every load step passes through to your transformer (flicker, harmonic resonance, see Chapter 4.4), to your behind-the-meter generator (which cannot follow a millisecond edge — gensets respond in seconds-to-minutes, see Chapter 4.8), and to the grid (where it becomes the utility's ramp-rate and stability problem). The modern answer is a layered stack where the fastest, smallest storage sits closest to the silicon and the slowest, largest storage sits at the facility, and ride-through becomes just the longest-timescale duty of a system whose primary job is now smoothing.

UPS topology: double-conversion, eco-mode, and when bypass is honest

The first fork is the conversion topology, and it is an efficiency-vs-protection trade. Double-conversion (VFI) rectifies incoming AC to DC and re-inverts it to a clean, isolated AC output — the load never touches raw utility power, so it is the gold standard for non-linear, disturbance-sensitive loads. The cost is a standing 3–5% conversion loss, which at gigawatt scale is tens of megawatts of pure heat you pay for continuously. Eco-mode / advanced standby runs the load on filtered utility power through a static bypass and fires the inverter only on a disturbance, pushing efficiency above 99% — but it accepts a brief transfer time and exposes the load to upstream power quality in the window before transfer.

The decision used to be simple: AI halls are non-linear and disturbance-sensitive, so eco-mode is reckless and you eat the double-conversion loss. The 2026 nuance is that the storage architecture is moving downstream — into the DC busbar and the rack — so the central AC UPS, where it survives, is increasingly a ride-through and isolation device whose eco-mode penalty is reconsidered now that millisecond transient absorption is handled at the rack by supercaps and BBUs. The honest rule: eco-mode is acceptable only where (a) the rack layer demonstrably owns the fast transient, and (b) the upstream source is clean enough that the transfer window carries no risk the load can't tolerate. Absent both, double-conversion is the price of admission. Bypass is acceptable as a maintenance and fault path — never as a steady-state efficiency dodge for an unprotected non-linear load.

Backup architecture fork: central UPS vs distributed rack BBU vs catcher/block-redundant

Architecture	Where storage sits	Response to fast transient	Typical runtime	Failure domain	Stranded-capacity / footprint penalty	Best fit
Central double-conversion UPS (N+1 or 2N)	Electrical room, upstream of the busway	Poor — too far, too slow for a ms GPU step	5–12 min	Large — a UPS module backs many racks	2N doubles UPS, battery, and floor; worst footprint	Mixed/legacy halls; ride-through + isolation duty
Distributed rack BBU (OCP ORV3 48 V)	In the rack / power shelf, on the 48 V busbar	Excellent — triggers in <2 ms on busbar droop	~4 min (rack BBU)	Small — failure contained to one rack/shelf	Low — storage scales 1:1 with IT; no 2N hall	OCP/hyperscale dense racks; transient absorption
Catcher / block-redundant (3N/2, 4N/3)	Shared reserve block catches a failed feed	Inherits the feed's storage; depends on layer	Per feed (5–12 min)	Block — a reserve covers N working blocks	Near-2N availability at ~N+1 capex/footprint	Large facilities trading 2N capex for utilization
Facility BESS (in front of / beside the plant)	Containerized, MV/LV-coupled at the campus	Sub-second — absorbs residual that escapes the rack	Minutes-to-hours (energy-sized)	Campus — one stack, many duties	Land + fire/thermal envelope; not IT-coupled	Smoothing, DR, generator bridge, ride-through

2026 practitioner ranges. Runtime figures are typical, not floors; the AI design driver is response time and energy-per-event, not minutes. Sources in keynumbers below.

Read this table as four answers to one question — where does the joule live? — and notice that they are not mutually exclusive. The modern AI facility runs several rows at once: a facility BESS for the slow, large duties; rack BBUs for the fast, local ones; and possibly a central UPS or catcher block where legacy isolation or 2N contractual uptime is still required. The fork is not "which one" but "how the labor is divided," and the expensive mistake is provisioning the same joule twice — paying for a full 2N central UPS and rack-level BBUs and a facility BESS that all back the same load because no one drew the division-of-labor diagram.

Central vs distributed: the failure-domain and stranded-capacity argument

The strategic shift of the era is from central to distributed storage, and it is driven by two AI-specific pressures. First, failure-domain economics: at AI scale a single central UPS module backs a large block of revenue-bearing GPUs, so its failure (or its maintenance bypass) is a large, correlated risk. Push the battery into the rack — the OCP ORV3 model, where the battery backup unit (BBU) sits on the 48 V busbar inside the power shelf — and the failure domain shrinks to a single rack. The cluster's reliability math (Chapter 12.2) increasingly favors many small, independent failure domains over a few large ones, because goodput tolerates losing a rack far better than losing a UPS block.

Second, stranded capacity and footprint. A 2N central UPS doubles not just the UPS modules but the battery rooms and the floor they occupy — untenable when that floor could hold revenue GPUs and when battery capex is a material line item. Distributed BBUs scale storage 1:1 with IT (you add storage exactly where and when you add compute), eliminate the dedicated 2N battery hall, and cut the standing double-conversion loss because the rack draws from the DC busbar. The published figures are striking: moving to distributed BBUs + supercapacitors can cut roughly 50% of battery capacity and shave 2–3% of double-conversion loss versus a central 2N design, while delivering better transient response because the storage is millimeters of busbar from the load. The catch is operational: thousands of distributed cells mean thousands of state-of-charge and end-of-life events to manage — the SoC-orchestration problem that the facility BESS chapter and the metering layer have to solve at fleet scale.

Deep dive: how the ORV3 48 V BBU actually behaves (the millisecond budget)

The Open Rack V3 power architecture is the reference implementation of distributed transient absorption, and its trigger behavior is worth knowing precisely because it sets the response-time bar the whole stack is measured against. The 48 V busbar operates in a 47.5–50.5 V window. When a synchronized GPU load step pulls the busbar down, the BBU triggers at ~48.5 V and ramps to full power in under 2 ms, and the control target is that the busbar never drops below 46 V — i.e., the rack rides its own transient locally, without the central UPS or the grid ever seeing the full edge. Power shelves deliver on the order of ~33 kW each at ~660 A, with PSU efficiency ≥97.5% across the 30–100% load band, and a GB200-class rack carries roughly 6–8 shelves.

Below the BBU's millisecond regime sits the supercapacitor, which covers the sub-millisecond edge — the very fastest di/dt that even a 2 ms BBU ramp can't catch. And below that sits on-package capacitance on the silicon itself (Chapter 7.12). The lesson: there is no single device that absorbs "the transient." There is a relay of devices, each handing off to the next slower/larger one as the event lengthens — capacitor (µs) → supercap (sub-ms) → BBU (ms) → BESS (sub-second to seconds) → generator (seconds-to-minutes). Specify a gap in that relay and the transient simply skips to the next layer that can catch it, which is always larger, slower, and more expensive — and sometimes that layer is the grid.

Chemistry: VRLA → LFP, and the supercapacitor for the fast edge

The chemistry fork is now largely settled in one direction, with a specialist exception. VRLA (valve-regulated lead-acid) — cheap, heavy, short-lived (3–5 year replacement), temperature-sensitive — is exiting AI halls. LFP (lithium iron phosphate) is the default: higher energy and power density, longer cycle life, far better thermal stability than NMC lithium (a real fire-and-insurance consideration at MWh scale), and high C-rates that suit the short, hard discharges the AI duty cycle demands. The decision-relevant number is the discharge rate: a 12C LFP cell delivers a 5-minute discharge, which roughly halves the cabinet count per MW versus a lower-rate design — directly recovering floor that becomes revenue GPUs. That footprint recovery, not just cycle life, is why LFP wins the AI argument even where VRLA's upfront cost is lower.

The specialist exception is the supercapacitor, and it earns its place precisely because chemistry can't do everything. Batteries store a lot of energy and release it over minutes; supercaps store little energy but release it in microseconds-to-milliseconds at very high power, across millions of cycles, indifferent to temperature. That makes them the right tool for the sub-millisecond GPU edge and for ride-through bridges where you need huge instantaneous power for a very short time — which is exactly why 800 VDC reference architectures (e.g., Eaton's, with NVIDIA and ABB) pair supercapacitors for the fast transient with batteries for the longer ride-through. The design pattern is hybrid by timescale: supercap for power-dense/short, LFP for energy-dense/long. Choosing one chemistry for both duties is the classic error — an LFP-only stack is sluggish on the fast edge; a supercap-only stack has no runtime.

The mitigation spine: chip → BBU → BESS, by timescale

This is the canonical model the rest of the guide cross-references, so state it plainly: the synchronized GPU transient is absorbed by a relay of storage layers, each owning a timescale and an energy-per-event. The on-die and on-package capacitance catches the fastest, smallest edge at the source (Chapter 7.12). The rack supercap/BBU catches the millisecond step on the 48 V (or HVDC) busbar before it leaves the rack. The facility BESS catches the residual that escapes the rack and the slower campus-scale swing, and also does the slow duties — generator bridging, demand response, ride-through. The published division-of-labor anchors are worth committing to memory: NVIDIA's GB300 NVL72 integrates ~65 J/GPU of energy storage in the power shelves and demonstrated a 30% reduction in peak grid demand while training Megatron-LLM; the Vera Rubin generation is reported to push on-rack energy storage to roughly ~400 J/GPU (≈6×), signaling that the on-rack layer is being deliberately over-provisioned so the grid sees an ever-flatter load.

The real question is who owns each timescale and who pays. Push more smoothing on-package and into the rack and you flatten the load before it ever reaches your transformer or generator — but you pay in silicon area, PSU volume (on GB300 roughly half the PSU volume is capacitance), and rack cost. Skimp on the rack layer and lean on the facility BESS and the grid, and you ship the transient downstream onto equipment that is slower and, in the grid's case, not yours to command — inviting utility-imposed ramp-rate limits, flicker charges, or an outright interconnection condition. The frontier debate, unresolved in 2026, is exactly this allocation: how much belongs on-chip vs rack vs facility vs grid, and whether utilities will mandate (and meter) a maximum ramp rate at the meter.

Skip a layer of the spine and the transient just finds the next one

The spine fails the way a relay race fails: drop the baton and the next runner is slower. If on-package capacitance is under-provisioned, the millisecond edge hits the BBU harder. If the rack BBU/supercap layer is missing or undersized, the step reaches the facility transformer and behind-the-meter generator — which physically cannot follow a millisecond edge (genset/turbine response is seconds-to-minutes; see Chapter 4.8) — and then the grid, where it becomes a flicker, resonance, and ramp-rate problem you no longer control. The 1,500 MW NERC events are what "the grid caught it" looks like at scale. Design the spine as a continuous relay with no gaps, and validate each handoff at acceptance — because the gap you leave in design is invisible until a real workload finds it.

Sizing basis: EDPp, not TDP

The single most common sizing error is using nameplate TDP as the basis. The real basis is EDPp — electrical design power, peak — which runs roughly 1.5× TDP because the peak instantaneous draw of a synchronized collective far exceeds the thermal-design steady state. Size the power chain, the storage, and the provisioning ratio to TDP and you will clip on every load step, throttle GPUs (lost goodput), or trip protection. The flip side is the oversubscription opportunity: once capacitance + BBU + BESS + intelligent power-capping are in place and demonstrably flattening the peak, you can defensibly provision the upstream chain below the naive sum of EDPp — because the storage stack guarantees the transformer and generator never see the full peak. Uptime Institute's field data frames the headroom gap bluntly: training loads run on the order of ~3% power-headroom margin while inference can need ~21%, because the transient signature differs by workload.

This is where the chapter touches money. The provisioning ratio you can defend — how far below the EDPp sum you size the grid connection — is a direct function of how much transient-absorption you have proven and metered. Under-provision storage and you must over-provision the (scarce, long-lead) grid connection. Over-provision storage and you've spent capex and floor on joules you don't use. The optimization sits between, and it can only be settled with measured data, which is why metering and acceptance are not an afterthought but the thing that unlocks the capital efficiency.

Energy-storage placement in DC-disaggregated designs

As racks cross ~200 kW and the architecture migrates from 48 V AC-DC-in-rack to disaggregated ±400 V / 800 VDC sidecar power (the full treatment is Chapter 4.7, fed by the solid-state transformer of Chapter 4.4), the storage question re-opens: where does the joule live in a DC world? The disaggregated answer moves AC-DC conversion, energy storage, and eventually the SST out of the compute rack into a dedicated sidecar power rack feeding an HVDC busbar — which frees the entire IT rack for accelerators (a ~3% efficiency and density win) and isolates battery and conversion heat from the compute. Storage in this model sits in the sidecar on the DC bus, close enough to the load to keep the fast-transient advantage of distribution while consolidating the cells for serviceability and thermal containment.

The placement fork carries real consequences. Keep storage in the compute rack (ORV3-style BBU) and you maximize transient proximity but compete with GPUs for the most expensive floor and complicate cooling. Move it to the sidecar and you recover IT-rack space and isolate the thermal/fire risk, at the cost of a slightly longer (but still very short) electrical path to the load. Move it to a facility BESS and you get scale economics, easy DR participation, and a clean fire envelope, but you're now too far away for the millisecond edge — so the sidecar/rack layer cannot be eliminated, only complemented. In practice the DC-disaggregated design lands on all three: supercaps/BBU at the rack or sidecar for the fast edge, facility BESS for the slow duties, and a DC-bus SoC-management scheme tying thousands of distributed elements into one controllable stack. The grounding and ground-fault-monitoring implications of all this DC storage are owned by Chapter 4.11.

65 J/GPU

energy storage integrated in GB300 NVL72 power shelves; ~half the PSU volume is capacitance

2025NVIDIA Developer Blog (GB300 steady-power)

30%

peak grid-demand reduction demonstrated while training Megatron-LLM with energy-enhanced power shelves

2025NVIDIA Developer Blog (GB300 steady-power)

~400 J/GPU

reported on-rack energy storage on Vera Rubin (~6× GB300), per BESS-for-AI guidance

2026 (roadmap)NVIDIA / SemiAnalysis (Vera Rubin)

<2 ms

ORV3 BBU ramp to full power on 48.5 V busbar droop; busbar held ≥46 V; PSU ≥97.5% (30–100% load)

2025OCP Open Rack V3 / ORV3 BBU spec

~1.5× TDP

EDPp (electrical design power, peak) as the real sizing basis vs nameplate TDP

2026Uptime Institute Journal; OCP/Diablo 400

3–5% vs >99%

double-conversion loss vs eco-mode/advanced-standby efficiency

2026Vertiv; ScienceDirect (battery/UPS systems)

~50% / 2–3%

battery-capacity cut and double-conversion-loss cut from distributed BBU + supercaps vs central 2N

2026SemiAnalysis; Eaton 800 VDC reference architecture

~1,500 MW

instantaneous data-center load lost on a single 230 kV fault — the macro symptom of the transient problem

2026NERC Level 3 Alert / Utility Dive

Metering, acceptance, and contracting the transient claim

A transient-absorption architecture you cannot measure is a liability, not an asset — because the entire capital case (a tighter provisioning ratio, a smaller grid connection, a tenant SLA on power quality) rests on the claim that the storage stack actually flattens the peak. So this chapter owns the metering and acceptance hooks even though the broader power-quality monitoring layer is treated downstream. Three artifacts make the claim defensible:

Sub-cycle metering at the meter and the rack. You must capture the load step, not just the average — high-rate power and busbar-voltage telemetry that shows the BBU triggering, the busbar holding ≥46 V, and the smoothed waveform the transformer actually sees. Closed-loop control (NVIDIA SMI / Redfish power caps, intelligent power steering) depends on this data and is itself part of the spine.
Step-load and discharge acceptance tests. Commissioning must prove the relay, not just the runtime: load-bank discharge for ride-through, and explicit step-load tests that inject a synchronized swing and verify each layer hands off cleanly (supercap → BBU → BESS) without a downstream excursion. UPS transfer and generator-pickup tests remain, but the new acceptance criterion is the shape of the absorbed transient. (Commissioning sequence: Chapter 4.8 for generator/island integration.)
The contract layer. The ramp-rate and flicker behavior at the point of interconnection is increasingly a utility-imposed condition and a tenant-facing SLA. The metered transient signature is what you bring to the interconnection study (Chapter 4.3) and the grid-interactive obligations (Chapter 4.10) — and what lets you turn the storage stack from a cost center into a grid-services revenue line (Chapter 15.8).

The fork that sets your grid connection size

Decide early how aggressively you will smooth, because it sizes the scarcest thing you own — the interconnection. Smooth hard at the rack (over-provision on-package + BBU + supercap, à la 400 J/GPU) and the grid sees a flat-ish load, so you can defend a smaller, cheaper, faster-to-energize connection and offer the utility ramp-rate compliance — at the cost of silicon area, PSU volume, and rack capex. Smooth lightly and lean on a facility BESS and the grid, and you save rack cost but must buy (and wait years for) more grid headroom and risk a ramp-rate or flicker condition you can't meet. There is no free layer: the joule is paid for at the rack, at the facility, or — worst of all — by the grid, which will eventually charge you back for it.

The on-die and on-package origin of the transient — the di/dt the capacitance relay starts from — is engineered in Chapter 7.12. The cooling-side twin of ride-through, where loss-of-flow thermal-trips a GPU in seconds and UPS-backed pumps become mandatory, is in Chapter 5.12. The non-linear-load harmonics that the same load creates upstream of storage are in Chapter 4.4; the LV busbar and OCP power-shelf detail in Chapter 4.6; the DC-disaggregated sidecar architecture in Chapter 4.7; generator/BESS bridging and island integration in Chapter 4.8; grounding and DC ground-fault monitoring in Chapter 4.11. The grid-facing obligations this storage stack discharges — ride-through, ramp limits, frequency response toward the POI — are in Chapter 4.10 and the NERC framing in Chapter 4.3. The reliability rethink that favors small distributed failure domains is in Chapter 12.2; the grid-services revenue the stack can earn is in Chapter 15.8.