Chapter 5.5
Immersion Cooling (Single-Phase & Two-Phase)
Immersion wins the PUE and density argument on paper and loses the deployment argument in practice — single-phase is a serviceable niche with real heat-reuse appeal, two-phase is a stalled technology whose enabling fluid the chemical industry is walking away from, and direct-to-chip beat both to the 2026 rack.
What you'll decide here
- Whether to commit any white space to immersion at all in 2026 — or treat it as a contained pilot — given that direct-to-chip liquid already carries the frontier-density racks at a third of the capex and a tenth of the operational friction.
- If you do go immersion: single-phase (serviceable, supply-secure, ~1.03-1.10 PUE) versus two-phase (best-in-class PUE but PFAS-exposed, fluid-supply-broken, and insurability-gated).
- Fluid chemistry and its supply chain as a first-class procurement risk — not a consumable line item — including PFAS regulatory exposure, OEM warranty certification, and the cost of a full re-fluid.
- The serviceability tax: no hot-swap, full-server lift-out, fluid handling and drip management, and ~3x mean-time-to-repair against a fleet of 1 kW+ GPUs that fail often.
- Floor-loading and structural basis for flooded tanks and CDUs (multi-ton point loads) before you assume an existing slab can host a tank farm.
Immersion cooling is the technology that should have won and didn't. Submerge the whole board in a dielectric bath and you remove every air gap, every fan, every cold-plate interface resistance between silicon and coolant. The thermodynamics are excellent: partial PUEs at or below 1.05, rack densities that air and even direct-to-chip struggle to reach, near-silent halls, and warm fluid that is unusually clean to reuse. For a decade the cooling roadmaps drew immersion as the inevitable endpoint past the air-cooling cliff. As of 2026 that endpoint has not arrived, and the reason is not thermal but institutional: chemistry, serviceability, supply chain, and insurability. The frontier racks of 2025-2026 are cooled by direct-to-chip liquid, not by tanks. → Chapter 5.4.
This chapter weighs a technology you may correctly choose not to deploy. The two architectures that get lumped together are single-phase immersion (the fluid stays liquid, pumped past the boards) and two-phase immersion (the fluid boils on the hot components and condenses on a coil), and their 2026 fates diverged completely. Single-phase is a viable, supply-secure niche. Two-phase is a stalled technology orphaned by the exit of the one chemical maker that made its fluids. The forks ahead are immersion vs direct-to-chip at the architecture level and single- vs two-phase within immersion, and each carries downstream costs: capex, serviceability MTTR, floor loading, PFAS liability, and the insurance/fire-suppression posture that quietly gates whether a lender or insurer will let you build it at all. → Chapter 2.6, Chapter 6.5.
The two architectures, and why the distinction now decides projects
Both architectures drop servers — de-fanned, with thermal-interface material swapped for immersion-grade compound and any spinning disks replaced by flash — into a dielectric fluid that is electrically non-conductive, so live boards sit safely submerged. The difference is what the fluid does with the heat.
Single-phase immersion keeps the coolant liquid throughout. A pump circulates a synthetic hydrocarbon or fluorinated oil through the tank and across a liquid-to-liquid heat exchanger that hands the heat to the facility water loop. The fluid is benign, cheap-ish, and forgiving: it tolerates open tanks, partial fills, and routine service. Heat transfer is by forced convection, so the density ceiling is set by how much fluid you can pump and how warm you let it run — strong, but not unlimited. Operationally it behaves like a very wet rack: contained, manageable, serviceable with gloves and a drip tray.
Two-phase immersion exploits the latent heat of vaporization. The fluid is engineered to boil at a low temperature (typically ~49-60 °C) right at the hot components; the rising vapor carries enormous heat flux per unit area, condenses on a chilled coil at the top of a sealed tank, and rains back down. Boiling heat transfer is far more aggressive than convection, which is why two-phase posts the best PUE numbers in the industry (~1.01-1.05) and the highest density headroom. But it demands a sealed, pressure-managed vessel, a fluid with an exactingly tuned boiling point, and tight vapor control — and that fluid, until 2025, came almost exclusively from one place.
| Dimension | Direct-to-chip (DLC) | Single-phase immersion | Two-phase immersion |
|---|---|---|---|
| 2026 status | Mainstream default (~55% of liquid market) | Viable niche; OEM-certified fluids emerging | Stalled — PFAS fluid supply broken |
| Cooling system PUE | ~1.05-1.15 | ~1.03-1.10 | ~1.01-1.05 (best-in-class) |
| System capex | ~$300-500/kW | ~$1,000+/kW (fluid ~15-25% of capex) | Highest; sealed-tank + fluid premium |
| Density headroom | 200+ kW/rack; carries GB200/Rubin | High; pump- and warm-run-limited | Highest; latent-heat-driven |
| Serviceability | Hot-swap-ish; per-tray service | Full lift-out; drip mgmt; ~3x MTTR | Lift-out + vapor mgmt; worst MTTR |
| Fluid risk | Water/PG25 — benign, abundant | Synthetic hydrocarbon — supply-secure | PFAS — regulated, supply-orphaned |
| OEM warranty | Standard across Tier-1 OEMs | Certified fluids only (Shell, others 2025) | Sparse; chip-vendor support thin |
| Insurability / fire | Well-understood; standard suppression | Tank = fuel-load + suppression questions | Sealed vessel + PFAS handling scrutiny |
The table is a verdict. On the two columns that won the 2010s cooling debates, PUE and density headroom, immersion still leads, two-phase most of all. On every column that determines whether a hyperscale operator can actually deploy and operate the technology at scale in 2026 (capex, serviceability, fluid supply, OEM warranty, insurability) direct-to-chip wins, and two-phase loses worst. That inversion is the whole story of why immersion remains niche despite best-in-class efficiency. The market did not reject immersion on the merits of cooling; it rejected the operational and institutional friction that comes wrapped around the cooling.
The serviceability tax
The most underestimated cost of immersion is the labor and downtime of touching the hardware, not the fluid. A direct-to-chip rack services like a conventional rack: pull a tray, the dripless quick-disconnects seal, swap the part, reinsert. An immersion tank has no hot-swap. To replace a failed DIMM, NIC, or — most commonly — a GPU, a technician hoists the entire server out of the bath on a lift, lets the fluid drain (single-phase) or manages vapor escape (two-phase), works on a dripping board over a containment tray, and re-immerses. Practitioner data puts immersion hardware replacement at roughly 3x the mean-time-to-repair of air or cold-plate service.
That tax compounds against the failure rate of the fleet it is cooling. A GB200-class rack is 72 accelerators each dissipating ~1 kW+; at frontier scale, GPU and optics failures are a daily fleet event, not an annual one. Multiply a 3x MTTR penalty by a high failure cadence and the goodput cost — GPUs idle while a tank is opened, drained, and re-sealed — becomes the dominant operational argument against immersion for training fleets, where every node-hour lost is a job-wide straggler or a checkpoint restart. The cooling that minimizes PUE can maximize the time a failed node spends out of service. For a power-bound operator paying for every megawatt, idle accelerators in a tank are the worst kind of stranded capacity. → Chapter 5.4 on the serviceability advantage of cold plates.
Deep dive: single-phase serviceability, fluid logistics, and the re-fluid event
Single-phase immersion is the architecture worth engineering seriously, because its problems are tractable. The fluid is typically a synthetic hydrocarbon (e.g. GTL/PAO-based oils) or a fluorinated single-phase oil; it is non-volatile, non-flammable in the relevant sense (high flash point), and electrically benign. A tank holds on the order of hundreds of liters to a few thousand liters, and at $12-50/L the fluid charge alone is 15-25% of system capex — a tank can carry tens of thousands of dollars of fluid before a single server goes in.
Logistics dominate the operating model. Servers must be de-fanned and de-spun (no HDDs, no air movers), optics and connectors must be immersion-rated, and the fluid wicks into cable jackets, labels, and porous materials — so the bill of materials is constrained to immersion-compatible parts. Every lift-out displaces fluid that must be captured, filtered, and returned; over years the fluid degrades, absorbs contaminants, and must be periodically analyzed and topped up. The re-fluid event — draining, disposing of, and recharging a tank, whether for a fluid change, a leak, or a chemistry migration — is the immersion equivalent of the DLC commissioning flush, but with a fluid that is expensive to buy and, increasingly, expensive to dispose of under tightening chemical-disclosure regimes.
The 2025 development that improved single-phase's standing was OEM certification: Shell became a chip-maker-certified immersion fluid provider, and Tier-1 OEMs (Dell, HPE, Lenovo) extended warranty coverage to immersion provided a certified fluid and validated install are used. Non-certified fluid voids the warranty — which makes fluid selection a procurement decision coupled to your hardware support contract, not a free choice.
Floor loading and the structural basis
Immersion inverts the floor-loading conversation. Direct-to-chip adds only ~15-25 kg per rack — most 800-1,200 kg/m² raised floors absorb it without a structural look. A flooded immersion tank is a different object: a tank full of dense dielectric fluid plus submerged servers can reach multi-ton point loads, and the CDU/filtration skid adds more. Loads of up to ~3 tons per tank can exceed an ~800 kg/m² slab rating, which means immersion frequently forces a slab-on-grade design or structural reinforcement — a decision that belongs at scoping, not at install. A hall that pencils out for air or DLC may simply be unable to host a tank farm without re-pouring concrete.
This is why immersion is overwhelmingly a greenfield, purpose-built decision rather than a retrofit one. The combination of multi-ton point loads, low horizontal tanks that consume floor area differently than vertical racks, the fluid storage and handling infrastructure, and the suppression/containment requirements rarely fits a brownfield hall designed for vertical air-cooled cabinets. The retrofit path past the cooling cliff runs through rear-door heat exchangers and direct-to-chip, not tanks. → Chapter 5.4.
Where immersion still earns its keep: heat reuse and extreme density
Immersion is niche, not dead, and there are real workloads where it remains the right answer. Heat reuse is its strongest case: single-phase immersion delivers a clean, warm, single-stream fluid that is unusually well-suited to feeding a heat pump and a district-heating offtake. Because the whole board sits in the fluid, there is no parasitic air load to dilute the return temperature, and the heat comes out as one coherent stream rather than split between a liquid loop and a residual air loop the way a DLC+RDHx rack does. Where a district-heating network or an industrial offtake exists, immersion's heat-reuse quality can flip its economics. → Chapter 5.9 (engineering) and Chapter 15.5 (economics/district heating).
The other durable niches are extreme-density and harsh-environment deployments: cryptocurrency mining (where serviceability matters little and density-per-dollar dominates), space- or dust-constrained edge sites where a sealed tank beats trying to filter and condition air, and specialized HPC where the thermal headroom past direct-to-chip is genuinely needed today. As rack densities climb toward the 600 kW Kyber/Rubin-Ultra generation and beyond — where even direct-to-chip is approaching its single-phase limits — immersion's latent-heat headroom may re-enter the mainstream conversation, but only if a supply-secure, insurable two-phase fluid (or a sufficiently aggressive single-phase design) is available. The bet immersion is making is that the density ramp eventually outruns what cold plates can do. → Chapter 5.1 (the density wall) and Chapter 16.2 (subsystem roadmap).
The insurability and fire-safety gate
A constraint that rarely appears in cooling-technology comparisons but routinely decides them is whether the facility is insurable and code-compliant. A large open tank of dielectric fluid is, to a fire authority and an insurer, a fuel load and an unusual suppression problem — even when the fluid's flash point is high. Conventional rack-based suppression (clean-agent or pre-action sprinkler) does not map cleanly onto a tank farm; immersion changes the fire model, the containment requirement, and the spill-response plan. Two-phase adds a sealed pressure vessel and a PFAS-handling regime on top.
The consequence is that immersion can fail a project not on engineering but on diligence: an insurer (FM Global-class) or lender declines to underwrite a novel cooling architecture with thin actuarial history, or a local authority having jurisdiction will not sign off on the suppression scheme. This gate is often the real reason a board-approved immersion pilot never scales — the technology works, but the risk-transfer and life-safety paperwork does not close. Engage the insurer and the AHJ before committing white space, not after. → Chapter 6.5 (fire detection, suppression & life-safety) and Chapter 2.6 (insurance & risk transfer).