Guide › Cooling & Thermal Management › 5.4

Chapter 5.4

Direct-to-Chip Liquid Cooling (DLC) — The 2026 Default

Direct-to-chip liquid cooling stopped being a choice in 2026. Once a rack draws past the air ceiling, the only open decisions are single- vs two-phase, how you plumb the rack, and how tightly you budget flow, delta-T, and pressure, and each of those forks sets a downstream serviceability, reliability, and capex bill you live with for the asset's life.

POWER-BOUNDDENSITY-RAMPGOODPUT

What you'll decide here

Single-phase vs two-phase cold plates — and why single-phase is the 2026 default, with two-phase parked behind the PFAS reckoning until chip flux forces the question.
How you plumb the rack: in-rack manifolds with blind-mate/floating-tray couplings vs flexible-hose dripless quick disconnects — the serviceability-vs-reliability fork that governs every node swap for the next five years.
The per-chip and per-rack thermal budget — flow per kW, coolant delta-T, and the pressure-drop allowance across cold plate plus manifold plus QDs — because every milli-degree of cold-plate thermal resistance you spend you must buy back in pump head or flow.
Coolant chemistry — PG25 vs other glycol blends vs deionized water — and the material-compatibility, biocide, and freeze-protection consequences that follow the choice into the secondary loop.
What deliberately stays on air (NICs, DIMMs, PSUs, optics, VRMs) and how the residual ~10–20% air load is captured — because a hall that forgets the air tail strands the very racks it cooled.

By 2026 the argument about whether AI racks need liquid is over. A GB200 NVL72 rack draws roughly 132 kW; the practical air-cooling ceiling is around 41 kW per rack (Chapter 5.2). No amount of colder supply air or smarter containment closes that gap; it is a discontinuity, the cooling cliff of Chapter 5.1, and direct-to-chip liquid cooling (DLC) is the answer on the far side of it. Rear-door heat exchangers and air-assisted liquid (Chapter 5.3) bridge the 50–100 kW band for brownfields that cannot get facility water to the rack, but they are a bridge, not a destination. For any greenfield hosting frontier training or next-generation dense inference, DLC is the design basis before steel is cut.

This chapter is about the decisions that remain once DLC is assumed. They are not abstract. Choose two-phase and you inherit a PFAS supply-chain and liability problem. Choose flexible-hose quick disconnects over blind-mate manifolds and you trade factory-integrated reliability for field serviceability — and a different leak-risk profile. Budget the cold-plate delta-T too tight and you over-spec the CDU and the pumps; budget it too loose and you throttle the GPUs. Each fork carries a downstream cost, and they are cheaper to see before they are poured into a slab.

Cold-plate architectures: single-phase vs two-phase

A direct-to-chip cold plate is a sealed metal block — typically copper, sometimes copper-on-aluminum — pressed onto the die package through a thermal interface material (TIM), with coolant forced through internal microchannels or skived fins directly over the hot silicon. The heat path is short and the thermal resistance low: a few hundredths of a °C per watt from junction to coolant, which is what makes 1.0–2.3 kW per GPU package tractable. The first fork is whether the coolant changes phase inside the plate.

Single-phase cold plates keep the coolant liquid throughout. A water/glycol mix enters, warms by a bounded delta-T (commonly 7–12 °C), and leaves still liquid. Heat removal scales with mass flow times specific heat times delta-T — pump harder or run a wider delta-T to carry more watts. It is mechanically simple, the fluids are benign (water-based, non-PFAS), the pressure regime is well understood, and it maps cleanly onto the CDU-and-secondary-loop architecture of Chapter 5.6. The penalty is that water has finite heat capacity, so very high heat fluxes demand high flow and therefore pump energy and pressure drop.

Two-phase cold plates exploit the latent heat of vaporization: a low-boiling-point dielectric enters as liquid, boils inside the plate, and leaves as a vapor-liquid mixture. Because latent heat dwarfs sensible heat, two-phase moves enormous heat flux at low flow and a nearly isothermal plate surface — thermodynamically the superior answer for the 1.5–2.3 kW packages on the roadmap. The catch is the working fluid. The engineered dielectrics that boil at convenient temperatures are predominantly fluorochemicals — the PFAS family — and that is now a regulatory and liability liability rather than a footnote.

Why single-phase is the 2026 default

The fork resolved in single-phase's favor not because two-phase is worse thermally — it is better — but because the fluid supply chain broke. 3M's exit from PFAS manufacturing (winding down Novec/fluorinated fluids by end-2025), tightening US and EU PFAS restrictions, and multi-billion-dollar liability exposure stalled two-phase exactly as AI demand spiked. Single-phase direct-to-chip is the practical default — roughly 55% of the liquid-cooling market in 2026 (DCD/Schneider, IDTechEx) — because it is cheaper to deploy, uses benign water-based coolant, and aligns with the silicon roadmap for the next 5–10 years. Two-phase is parked in pilots and a few large early deployments, waiting for the moment chip-level flux exceeds what single-phase can carry — and for a non-PFAS working fluid that does not yet exist at scale. Bet single-phase for anything you commission in 2026; keep two-phase on the watch list, not the BOM. → immersion's parallel two-phase stall in Chapter 5.5.

Single-phase vs two-phase direct-to-chip — the cold-plate fork

Axis	Single-phase DLC	Two-phase DLC
Heat-transfer mode	Sensible heat; coolant stays liquid	Latent heat; coolant boils in the plate
Working fluid	Water/glycol (PG25 typical); benign, non-PFAS	Engineered dielectric — predominantly PFAS
Flow demand	Higher; ~1.2–2.0 L/min per kW	Much lower; latent heat does the work
Plate surface temp	Rises across the plate (delta-T 7–12 °C)	Near-isothermal at the boiling point
Pressure regime	Well-characterized; ~35 kPa per plate target	Two-phase flow instability risk; harder to control
2026 status	~55% of liquid market; the default	Stalled on PFAS; pilots only
Primary risk	Pump energy/flow at very high flux	Fluid supply chain, regulation, liability

Design bands are 2026-current practitioner figures (DCD/Schneider, OCP, Dober, SemiAnalysis). Two-phase figures are pilot/early-deployment, not mature production.

Getting coolant from the rack inlet to 72 cold plates and back, while letting a technician swap a failed tray in minutes without draining the rack, is the mechanical heart of DLC — and the second major fork. Every NVL72-class rack carries a vertical pair of manifolds (a supply and a return rail, the in-rack analogue of a busbar) running the rack height. Each compute tray taps the rails through couplings. The decision is what kind of coupling, and it trades serviceability against reliability and leak risk.

Blind-mate / floating-tray couplings are integrated into the tray and the manifold so that sliding the tray home automatically engages the fluid connection — no hose to route, no fitting to hand-torque. The 'floating' geometry absorbs the mechanical tolerance stack so the connection self-aligns. This is the factory-integration path: couplings are validated at L10/L11 integration (Chapter 5.13 on the mechanical side; rack integration in Part 7), and field service becomes a slide-out/slide-in operation. The cost is rigidity — the rack and tray geometry are co-designed and far less forgiving of field improvisation.

Flexible-hose dripless quick disconnects (UQDs) put a short hose with a dry-break coupling between the tray and the manifold. The technician physically connects two halves; the dry-break valve seals both sides on disconnect so the spill is a few drops, not a stream. This is more serviceable in the messy reality of a live hall and tolerant of tolerance stack-up, but every manual connection is a potential leak point and a human-error surface, and the hoses add pressure drop and clutter. OCP has standardized UQD form factors precisely to make these field-mateable and second-source-able.

The connection count is the leak-risk surface

An NVL72 rack has on the order of 150–200 fluid connections — two per cold plate across the supply/return rails, plus manifold-to-CDU drops. Reliability is multiplicative: a per-connection leak probability that looks negligible in isolation becomes a rack-level certainty across hundreds of mates and thousands of racks. This is why the blind-mate-vs-UQD fork is a fleet-reliability question, not a convenience one. Blind-mate reduces the number of field mates (the connection is made at integration and rarely broken); UQDs increase serviceability but multiply field-mate events over the asset life. Whatever you pick, leak detection at the rack and row (negative-pressure loops, conductivity/optical sensors, drip trays tied to the BMS) is not optional. → reliability, leak detection, and commissioning in Chapter 5.11.

Per-chip and per-rack thermal design

The thermal budget is where strategy becomes arithmetic, and it is governed by one conservation equation: the heat a loop carries equals mass flow times specific heat times the coolant temperature rise (Q = ṁ · cp · ΔT). Everything in DLC design is a negotiation among the three terms on the right — and against a pressure-drop ceiling that the pumps and CDU must overcome.

Flow per kW. The industry rule of thumb is roughly 1.2–2.0 L/min of coolant per kW of heat removed, with PG25 and a target delta-T in the 7.5–12 °C band. OCP's NVL72 guidance is more specific: about 1.5 L/min per kW guarantees a ≤10 °C coolant rise with every Blackwell GPU pinned at ~1 kW TDP. Roll that up and an NVL72 rack needs on the order of 700+ L/min at the manifold, with CDUs sized to ~750–800 L/min at design pressure. Run a wider delta-T and you cut the flow (and pump energy) for the same watts — but you raise the return-water temperature the heat-rejection plant must handle and you push the warmest cold plates closer to the throttle line.

Delta-T is the master tradeoff. A wider coolant delta-T is a gift to the facility: it means less flow, smaller pipes, lower pump energy, and warmer return water that free-cooling and heat-reuse plants love (Chapter 5.7, Chapter 5.9). But the GB200 envelope is unforgiving — coolant inlet around 20–25 °C, with deviation outside the window throttling the GPUs by up to ~50%. The delta-T you can run is bounded by the inlet temperature plus the cold-plate's thermal resistance: spend the budget on a wide delta-T and the last plate in a series path may sit too warm. Design teams therefore favor parallel manifold paths so every cold plate sees near-inlet coolant, and reserve series only where pressure budget forces it.

Pressure-drop budget. Pump head is divided among the cold plate (~35 kPa at the ~2.5 L/min nominal per-module flow), the in-rack manifold and headers, the quick disconnects, and the secondary loop back to the CDU. Rack-level manifold pressure drop across a 72-plate parallel circuit with headers and QDs typically runs 1.5–2.5 bar. Every fitting, every hose, every reduction in channel size buys lower thermal resistance at the price of more pressure drop — and pressure drop is pump energy, which shows up in PUE. The cold-plate designer's lever — finer microchannels for lower thermal resistance — is exactly the lever that raises pressure drop, so the per-chip design is a thermal-resistance-vs-pressure-drop optimization, not a free lunch.

~55%

single-phase direct-to-chip share of the liquid-cooling market in 2026 (the 2026 default)

2026DCD / Schneider Electric / IDTechEx

~1.5 L/min/kW

OCP flow rule for ≤10 °C coolant rise at ~1 kW/GPU TDP (range ~1.2–2.0 L/min/kW)

2025OCP NVL72 guidance / Dober PG25

20–25 °C

GB200 NVL72 coolant inlet window; deviation throttles GPUs up to ~50%

2025NVIDIA OCP / Introl

~700+ L/min

rack-level manifold flow for NVL72; CDU sized ~750–800 L/min at design pressure

2025QCT / Amphenol / ToneCooling

~35 kPa

per-module cold-plate pressure-drop target at ~2.5 L/min nominal flow

2025OCP / cold-plate vendor specs

~115 / ~17 kW

NVL72 heat split — removed by liquid vs left on air per rack (~132 kW total)

2025NVIDIA OCP / Vertiv 360AI reference design

$300–500/kW

direct-to-chip system capex vs ~$1,000+/kW for immersion

2026Gottog / Introl synthesis

up to 3x

in-silicon microfluidic cooling vs cold plates (≤65% lower peak temp rise) — the forward pointer

2025Microsoft Research / Tom's Hardware

Deep dive: coolant selection — why PG25, and the consequences of the choice

The secondary-loop coolant is a chemistry decision with mechanical consequences that ripple from the cold plate to the CDU. The dominant single-phase choice in 2026 is PG25 — a 25% propylene glycol / 75% water blend, usually pre-mixed with corrosion inhibitors and biocide. Why this specific blend? It is a balance of four competing properties.

Heat transfer. Pure water has the best specific heat and lowest viscosity — thermodynamically you would run water if you could. Glycol degrades both: it raises viscosity (more pump energy, more pressure drop) and lowers specific heat (more flow for the same watts). So you want the least glycol that buys the protection you need — hence 25%, not 50%.

Freeze and biofouling protection. The glycol you do add buys freeze protection for outdoor loop sections and dry coolers in cold climates, and propylene glycol (vs ethylene) is chosen for low toxicity — a leak near electronics and people is less hazardous. Biocide is non-negotiable: warm water in a closed loop is an ideal medium for biofilm that fouls microchannels and spikes pressure drop. PG25's typical operating window is a 7.5–12 °C delta-T at 1.25–2.0 L/min/kW.

Material compatibility. The loop is a mixed-metal system — copper cold plates, stainless or brass fittings, aluminum heat exchangers, EPDM/elastomer seals. Galvanic corrosion and incompatible elastomers are the silent killers: the wrong inhibitor package or an unmanaged pH lets dissolved copper plate out on aluminum and seals swell or embrittle. This is why deionized water alone is rarely run bare in production despite its thermal appeal — it is aggressive to some metals and offers no biocide or freeze margin. The coolant is therefore a managed fluid: filtration, periodic chemistry sampling, and inhibitor top-ups are an operational line item, not a fill-and-forget. → fluid chemistry management lives with the CDU in Chapter 5.6.

What stays on air — and how it is handled

DLC is not 'liquid cooling.' It is liquid cooling of the high-flux components — the GPUs, the CPUs/Grace dies, the NVLink/NVSwitch silicon, and increasingly the high-power VRMs — leaving a residual air load that a hall ignores at its peril. On an NVL72 rack, roughly 115 kW is removed by liquid and ~17 kW remains on air — about 13% of the rack. That tail is everything not worth a cold plate: NICs and optical transceivers, DIMMs, power-supply units, lower-power voltage regulators, the BMC, and miscellaneous board components. Optics in particular are a growing concern — pluggable transceiver power is climbing, and the optics sit at the rack's air-cooled edge precisely where airflow is now sparse.

The fork here is how you capture the air tail. Three patterns dominate. In-rack air-to-liquid: a small rear-door or in-chassis air-to-liquid heat exchanger rejects the residual air load back into the same liquid loop, so the rack exhausts neutral air and the hall needs no separate air plant — the cleanest answer, and the one that makes a 'zero-air-to-room' rack possible. Hybrid containment: the hall keeps a reduced CRAH/in-row air system sized only for the ~10–20% air tail, with hot/cold-aisle containment, which is simpler to retrofit but reintroduces an air plant and its PUE. Facility air: simplest and worst — let the tail dump into the room and handle it with the building's air system, acceptable only at low rack counts. The decision matters because a hall that sizes liquid for 115 kW and forgets the 17 kW air tail will thermally throttle on the optics and DIMMs while the GPUs run cold — stranding the rack it just spent $300–500/kW to liquid-cool. → containment strategy for hybrid halls in Chapter 5.3; the optics thermal problem resurfaces in Chapter 8.10.

What gets liquid, what stays on air — and why

Component	Cooling path	Rationale
GPU / accelerator package	Liquid (cold plate)	Highest flux (1.0–2.3 kW); over the air cliff
CPU / Grace die	Liquid (cold plate)	High flux; co-located on the tray
NVSwitch / NVLink silicon	Liquid (cold plate)	Dense interconnect silicon, significant draw
High-power VRMs	Liquid (increasingly)	Power-delivery losses now warrant a plate
NICs / optical transceivers	Air	Lower flux but rising; sit at the rack edge
DIMMs / memory	Air (often)	Distributed, lower flux; hard to cold-plate
PSUs / BMC / misc	Air	Modest power; not worth the plumbing

Representative NVL72-class split; exact allocation varies by vendor and generation. Air tail ~10–20% of rack power.

Forward pointer: in-silicon microfluidics

Cold plates have a hard physical limit: no matter how good the plate, heat must still conduct from the junction, through the package, across the TIM, and into the plate before the coolant ever sees it. That stacked thermal resistance — and especially the TIM — is what caps the flux a cold plate can handle. The next step removes the intermediary entirely: etch the coolant channels into the silicon itself.

In-chip (direct-to-silicon) microfluidics routes coolant through microscopic channels — each roughly a hair's width — cut directly into the die or the backside of the package, so liquid flows over the hotspots inside the chip rather than across an external plate. Microsoft's 2025 prototype, using AI-designed, leaf-vein-inspired channel networks, reported up to 3x better heat removal than state-of-the-art cold plates and up to 65% lower peak temperature rise. The rationale is pure thermal resistance: collapsing the conduction path from junction to coolant is the only way to keep pace with 3D-stacked dies and the 2–3 kW packages on the horizon, where a cold plate simply runs out of room. It is not a 2026 production technology — it is a roadmap signal. The full consolidated cooling roadmap, including immersion's role and the 600 kW–1 MW rack generation, lives in Chapter 16.2.

DLC sits in the middle of Part 5's cooling stack. The density wall that forces it is in Chapter 5.1; the air regime it replaces in Chapter 5.2; the RDHx/AALC bridge for brownfields in Chapter 5.3; immersion's parallel single-/two-phase story and the PFAS reckoning in Chapter 5.5. The secondary loop that feeds these cold plates — CDUs, fluid chemistry, dew-point margin — is Chapter 5.6; the facility water loop and warm-water strategy Chapter 5.7; heat rejection Chapter 5.8; heat reuse Chapter 5.9; retrofitting air halls to liquid Chapter 5.10; reliability, leak detection and commissioning Chapter 5.11; and the mechanical/pressure-system engineering of the piping Chapter 5.13. The archetype decision that made DLC mandatory is framed in Chapter 1.1; the in-silicon microfluidics roadmap in Chapter 16.2; and the optics thermal tail in Chapter 8.10.