Chapter 5.4
Direct-to-Chip Liquid Cooling (DLC) — The 2026 Default
Direct-to-chip liquid cooling stopped being a choice in 2026. Once a rack draws past the air ceiling, the only open decisions are single- vs two-phase, how you plumb the rack, and how tightly you budget flow, delta-T, and pressure, and each of those forks sets a downstream serviceability, reliability, and capex bill you live with for the asset's life.
What you'll decide here
- Single-phase vs two-phase cold plates — and why single-phase is the 2026 default, with two-phase parked behind the PFAS reckoning until chip flux forces the question.
- How you plumb the rack: in-rack manifolds with blind-mate/floating-tray couplings vs flexible-hose dripless quick disconnects — the serviceability-vs-reliability fork that governs every node swap for the next five years.
- The per-chip and per-rack thermal budget — flow per kW, coolant delta-T, and the pressure-drop allowance across cold plate plus manifold plus QDs — because every milli-degree of cold-plate thermal resistance you spend you must buy back in pump head or flow.
- Coolant chemistry — PG25 vs other glycol blends vs deionized water — and the material-compatibility, biocide, and freeze-protection consequences that follow the choice into the secondary loop.
- What deliberately stays on air (NICs, DIMMs, PSUs, optics, VRMs) and how the residual ~10–20% air load is captured — because a hall that forgets the air tail strands the very racks it cooled.
By 2026 the argument about whether AI racks need liquid is over. A GB200 NVL72 rack draws roughly 132 kW; the practical air-cooling ceiling is around 41 kW per rack (Chapter 5.2). No amount of colder supply air or smarter containment closes that gap; it is a discontinuity, the cooling cliff of Chapter 5.1, and direct-to-chip liquid cooling (DLC) is the answer on the far side of it. Rear-door heat exchangers and air-assisted liquid (Chapter 5.3) bridge the 50–100 kW band for brownfields that cannot get facility water to the rack, but they are a bridge, not a destination. For any greenfield hosting frontier training or next-generation dense inference, DLC is the design basis before steel is cut.
This chapter is about the decisions that remain once DLC is assumed. They are not abstract. Choose two-phase and you inherit a PFAS supply-chain and liability problem. Choose flexible-hose quick disconnects over blind-mate manifolds and you trade factory-integrated reliability for field serviceability — and a different leak-risk profile. Budget the cold-plate delta-T too tight and you over-spec the CDU and the pumps; budget it too loose and you throttle the GPUs. Each fork carries a downstream cost, and they are cheaper to see before they are poured into a slab.
Cold-plate architectures: single-phase vs two-phase
A direct-to-chip cold plate is a sealed metal block — typically copper, sometimes copper-on-aluminum — pressed onto the die package through a thermal interface material (TIM), with coolant forced through internal microchannels or skived fins directly over the hot silicon. The heat path is short and the thermal resistance low: a few hundredths of a °C per watt from junction to coolant, which is what makes 1.0–2.3 kW per GPU package tractable. The first fork is whether the coolant changes phase inside the plate.
Single-phase cold plates keep the coolant liquid throughout. A water/glycol mix enters, warms by a bounded delta-T (commonly 7–12 °C), and leaves still liquid. Heat removal scales with mass flow times specific heat times delta-T — pump harder or run a wider delta-T to carry more watts. It is mechanically simple, the fluids are benign (water-based, non-PFAS), the pressure regime is well understood, and it maps cleanly onto the CDU-and-secondary-loop architecture of Chapter 5.6. The penalty is that water has finite heat capacity, so very high heat fluxes demand high flow and therefore pump energy and pressure drop.
Two-phase cold plates exploit the latent heat of vaporization: a low-boiling-point dielectric enters as liquid, boils inside the plate, and leaves as a vapor-liquid mixture. Because latent heat dwarfs sensible heat, two-phase moves enormous heat flux at low flow and a nearly isothermal plate surface — thermodynamically the superior answer for the 1.5–2.3 kW packages on the roadmap. The catch is the working fluid. The engineered dielectrics that boil at convenient temperatures are predominantly fluorochemicals — the PFAS family — and that is now a regulatory and liability liability rather than a footnote.
| Axis | Single-phase DLC | Two-phase DLC |
|---|---|---|
| Heat-transfer mode | Sensible heat; coolant stays liquid | Latent heat; coolant boils in the plate |
| Working fluid | Water/glycol (PG25 typical); benign, non-PFAS | Engineered dielectric — predominantly PFAS |
| Flow demand | Higher; ~1.2–2.0 L/min per kW | Much lower; latent heat does the work |
| Plate surface temp | Rises across the plate (delta-T 7–12 °C) | Near-isothermal at the boiling point |
| Pressure regime | Well-characterized; ~35 kPa per plate target | Two-phase flow instability risk; harder to control |
| 2026 status | ~55% of liquid market; the default | Stalled on PFAS; pilots only |
| Primary risk | Pump energy/flow at very high flux | Fluid supply chain, regulation, liability |
In-rack plumbing: manifolds, blind-mate, and quick disconnects
Getting coolant from the rack inlet to 72 cold plates and back, while letting a technician swap a failed tray in minutes without draining the rack, is the mechanical heart of DLC — and the second major fork. Every NVL72-class rack carries a vertical pair of manifolds (a supply and a return rail, the in-rack analogue of a busbar) running the rack height. Each compute tray taps the rails through couplings. The decision is what kind of coupling, and it trades serviceability against reliability and leak risk.
Blind-mate / floating-tray couplings are integrated into the tray and the manifold so that sliding the tray home automatically engages the fluid connection — no hose to route, no fitting to hand-torque. The 'floating' geometry absorbs the mechanical tolerance stack so the connection self-aligns. This is the factory-integration path: couplings are validated at L10/L11 integration (Chapter 5.13 on the mechanical side; rack integration in Part 7), and field service becomes a slide-out/slide-in operation. The cost is rigidity — the rack and tray geometry are co-designed and far less forgiving of field improvisation.
Flexible-hose dripless quick disconnects (UQDs) put a short hose with a dry-break coupling between the tray and the manifold. The technician physically connects two halves; the dry-break valve seals both sides on disconnect so the spill is a few drops, not a stream. This is more serviceable in the messy reality of a live hall and tolerant of tolerance stack-up, but every manual connection is a potential leak point and a human-error surface, and the hoses add pressure drop and clutter. OCP has standardized UQD form factors precisely to make these field-mateable and second-source-able.
Per-chip and per-rack thermal design
The thermal budget is where strategy becomes arithmetic, and it is governed by one conservation equation: the heat a loop carries equals mass flow times specific heat times the coolant temperature rise (Q = ṁ · cp · ΔT). Everything in DLC design is a negotiation among the three terms on the right — and against a pressure-drop ceiling that the pumps and CDU must overcome.
Flow per kW. The industry rule of thumb is roughly 1.2–2.0 L/min of coolant per kW of heat removed, with PG25 and a target delta-T in the 7.5–12 °C band. OCP's NVL72 guidance is more specific: about 1.5 L/min per kW guarantees a ≤10 °C coolant rise with every Blackwell GPU pinned at ~1 kW TDP. Roll that up and an NVL72 rack needs on the order of 700+ L/min at the manifold, with CDUs sized to ~750–800 L/min at design pressure. Run a wider delta-T and you cut the flow (and pump energy) for the same watts — but you raise the return-water temperature the heat-rejection plant must handle and you push the warmest cold plates closer to the throttle line.
Delta-T is the master tradeoff. A wider coolant delta-T is a gift to the facility: it means less flow, smaller pipes, lower pump energy, and warmer return water that free-cooling and heat-reuse plants love (Chapter 5.7, Chapter 5.9). But the GB200 envelope is unforgiving — coolant inlet around 20–25 °C, with deviation outside the window throttling the GPUs by up to ~50%. The delta-T you can run is bounded by the inlet temperature plus the cold-plate's thermal resistance: spend the budget on a wide delta-T and the last plate in a series path may sit too warm. Design teams therefore favor parallel manifold paths so every cold plate sees near-inlet coolant, and reserve series only where pressure budget forces it.
Pressure-drop budget. Pump head is divided among the cold plate (~35 kPa at the ~2.5 L/min nominal per-module flow), the in-rack manifold and headers, the quick disconnects, and the secondary loop back to the CDU. Rack-level manifold pressure drop across a 72-plate parallel circuit with headers and QDs typically runs 1.5–2.5 bar. Every fitting, every hose, every reduction in channel size buys lower thermal resistance at the price of more pressure drop — and pressure drop is pump energy, which shows up in PUE. The cold-plate designer's lever — finer microchannels for lower thermal resistance — is exactly the lever that raises pressure drop, so the per-chip design is a thermal-resistance-vs-pressure-drop optimization, not a free lunch.
Deep dive: coolant selection — why PG25, and the consequences of the choice
The secondary-loop coolant is a chemistry decision with mechanical consequences that ripple from the cold plate to the CDU. The dominant single-phase choice in 2026 is PG25 — a 25% propylene glycol / 75% water blend, usually pre-mixed with corrosion inhibitors and biocide. Why this specific blend? It is a balance of four competing properties.
Heat transfer. Pure water has the best specific heat and lowest viscosity — thermodynamically you would run water if you could. Glycol degrades both: it raises viscosity (more pump energy, more pressure drop) and lowers specific heat (more flow for the same watts). So you want the least glycol that buys the protection you need — hence 25%, not 50%.
Freeze and biofouling protection. The glycol you do add buys freeze protection for outdoor loop sections and dry coolers in cold climates, and propylene glycol (vs ethylene) is chosen for low toxicity — a leak near electronics and people is less hazardous. Biocide is non-negotiable: warm water in a closed loop is an ideal medium for biofilm that fouls microchannels and spikes pressure drop. PG25's typical operating window is a 7.5–12 °C delta-T at 1.25–2.0 L/min/kW.
Material compatibility. The loop is a mixed-metal system — copper cold plates, stainless or brass fittings, aluminum heat exchangers, EPDM/elastomer seals. Galvanic corrosion and incompatible elastomers are the silent killers: the wrong inhibitor package or an unmanaged pH lets dissolved copper plate out on aluminum and seals swell or embrittle. This is why deionized water alone is rarely run bare in production despite its thermal appeal — it is aggressive to some metals and offers no biocide or freeze margin. The coolant is therefore a managed fluid: filtration, periodic chemistry sampling, and inhibitor top-ups are an operational line item, not a fill-and-forget. → fluid chemistry management lives with the CDU in Chapter 5.6.
What stays on air — and how it is handled
DLC is not 'liquid cooling.' It is liquid cooling of the high-flux components — the GPUs, the CPUs/Grace dies, the NVLink/NVSwitch silicon, and increasingly the high-power VRMs — leaving a residual air load that a hall ignores at its peril. On an NVL72 rack, roughly 115 kW is removed by liquid and ~17 kW remains on air — about 13% of the rack. That tail is everything not worth a cold plate: NICs and optical transceivers, DIMMs, power-supply units, lower-power voltage regulators, the BMC, and miscellaneous board components. Optics in particular are a growing concern — pluggable transceiver power is climbing, and the optics sit at the rack's air-cooled edge precisely where airflow is now sparse.
The fork here is how you capture the air tail. Three patterns dominate. In-rack air-to-liquid: a small rear-door or in-chassis air-to-liquid heat exchanger rejects the residual air load back into the same liquid loop, so the rack exhausts neutral air and the hall needs no separate air plant — the cleanest answer, and the one that makes a 'zero-air-to-room' rack possible. Hybrid containment: the hall keeps a reduced CRAH/in-row air system sized only for the ~10–20% air tail, with hot/cold-aisle containment, which is simpler to retrofit but reintroduces an air plant and its PUE. Facility air: simplest and worst — let the tail dump into the room and handle it with the building's air system, acceptable only at low rack counts. The decision matters because a hall that sizes liquid for 115 kW and forgets the 17 kW air tail will thermally throttle on the optics and DIMMs while the GPUs run cold — stranding the rack it just spent $300–500/kW to liquid-cool. → containment strategy for hybrid halls in Chapter 5.3; the optics thermal problem resurfaces in Chapter 8.10.
| Component | Cooling path | Rationale |
|---|---|---|
| GPU / accelerator package | Liquid (cold plate) | Highest flux (1.0–2.3 kW); over the air cliff |
| CPU / Grace die | Liquid (cold plate) | High flux; co-located on the tray |
| NVSwitch / NVLink silicon | Liquid (cold plate) | Dense interconnect silicon, significant draw |
| High-power VRMs | Liquid (increasingly) | Power-delivery losses now warrant a plate |
| NICs / optical transceivers | Air | Lower flux but rising; sit at the rack edge |
| DIMMs / memory | Air (often) | Distributed, lower flux; hard to cold-plate |
| PSUs / BMC / misc | Air | Modest power; not worth the plumbing |
Forward pointer: in-silicon microfluidics
Cold plates have a hard physical limit: no matter how good the plate, heat must still conduct from the junction, through the package, across the TIM, and into the plate before the coolant ever sees it. That stacked thermal resistance — and especially the TIM — is what caps the flux a cold plate can handle. The next step removes the intermediary entirely: etch the coolant channels into the silicon itself.
In-chip (direct-to-silicon) microfluidics routes coolant through microscopic channels — each roughly a hair's width — cut directly into the die or the backside of the package, so liquid flows over the hotspots inside the chip rather than across an external plate. Microsoft's 2025 prototype, using AI-designed, leaf-vein-inspired channel networks, reported up to 3x better heat removal than state-of-the-art cold plates and up to 65% lower peak temperature rise. The rationale is pure thermal resistance: collapsing the conduction path from junction to coolant is the only way to keep pace with 3D-stacked dies and the 2–3 kW packages on the horizon, where a cold plate simply runs out of room. It is not a 2026 production technology — it is a roadmap signal. The full consolidated cooling roadmap, including immersion's role and the 600 kW–1 MW rack generation, lives in Chapter 16.2.