Chapter 8.10
CPO, Fiber Plant & Structured Cabling
The link is a pluggable transceiver, the fiber plant is a structured-cabling system, and the next decade's question is whether the optics stay on the faceplate or move onto the package — a fork that trades the worst power problem in the building against the worst serviceability problem in the building.
What you'll decide here
- Whether to keep optics on the faceplate (pluggable/LPO/LRO) or move them onto the switch package (co-packaged optics, CPO) — the decision that trades a multi-megawatt fabric-power saving against the loss of hot-swap serviceability and a single-vendor optical engine.
- Single-mode (OS2) versus multimode (OM3/OM4/OM5) as the structural fiber choice — because OS2 trunked once survives the 800G → 1.6T → 3.2T ramp on transceiver swaps alone, while multimode caps out and forces a re-pull.
- The MPO/MTP connector and base-N trunking system (Base-8 vs Base-12 vs MPO-16) that pins your fiber count, your polarity scheme, and whether breakout matches the transceiver's lane map without a conversion cassette.
- The end-to-end channel loss budget — connector count, mated-pair loss, fiber attenuation — against the tight 800G/1.6T budgets that leave almost no margin for a sloppy install.
- Whether the cabling is structured (patch panels, MACs, documented polarity) or point-to-point (faceplate-to-faceplate jumpers) — the choice that decides install velocity, fault isolation, and whether the next density step is a re-cable or a re-patch.
By this point in Part 8 the fabric topology is decided, the switch silicon is selected, and the link budget is understood (Chapter 8.9). What remains is the most physical layer of all: the glass that carries the photons, the connectors that join it, and the increasingly fraught question of where the laser lives. An AI back-end fabric at 100k-GPU scale is a structured-cabling project measured in tens of thousands of fiber-miles and millions of mated connections — and at 800G and 1.6T per link, the optics that drive it have become one of the largest single line items of power and cost in the network, large enough that the industry is now willing to re-architect the switch itself to shrink them.
This chapter works through three coupled choices. First, co-packaged optics (CPO): the move to integrate optical engines onto the switch package, the power and signal-integrity economics that justify it, and the serviceability tax it imposes. Second, the fiber plant: single-mode versus multimode, the reach classes, and the channel architecture that the whole ramp is trunked against. Third, structured cabling at scale: MPO trunking, base-N systems, polarity, and the loss budget that an install either lands inside or fails acceptance against. The CPO roadmap to 2030 is consolidated in Chapter 16.2; the install-velocity and field-execution side of cabling-at-scale lives in Chapter 7.15.
Co-packaged optics: why the laser is moving onto the package
The case for CPO is built from a chain of physical facts, each of which gets worse as lane rates climb from 100G to 200G to 400G per lane. The electrical signal from a switch ASIC to a faceplate pluggable traverses package balls, PCB traces, a connector, and the module's own substrate — a path whose insertion loss grows steeply with frequency. At 200G/lane (PAM4 on ~106 GBaud) that path is already marginal; it is held together by a power-hungry DSP in every pluggable whose job is to equalize, retime, and forward-error-correct the signal back into shape. The DSP is the problem. It is the single largest power consumer in the module, and there is one per link.
CPO attacks the path itself. By placing the optical engine on the same substrate as the switch ASIC, the electrical reach collapses from tens of centimeters to a few millimeters. The signal no longer needs a heroic DSP to survive the journey, so the per-link power falls dramatically and the signal integrity rises. NVIDIA's silicon-photonics switches quote 3.5x better power efficiency, 63x better signal integrity, and 10x better network resiliency at scale, achieved in part with 4x fewer lasers than an equivalent pluggable build. Broadcom's Tomahawk 6 'Davisson' — the first 102.4 Tb/s Ethernet switch with CPO, shipping in 2026 — quotes a ~70% reduction in optical interconnect power versus pluggables, consistent with the same ~3.5x figure. At fleet scale these are not rounding errors: a fabric power line item that runs to several megawatts on pluggables is the thing CPO is built to delete, and in a power-bound facility (Chapter 8.1) every megawatt returned to compute is revenue.
The reason CPO becomes mandatory rather than merely attractive is radix. A 102.4 Tb/s switch needs 64 ports of 1.6T — or 128 of 800G. There is not enough faceplate to mount that many pluggable cages, and even if there were, the aggregate DSP power and the PCB-trace loss at 200G/lane make the faceplate path unbuildable at the top of the fabric. CPO is therefore a top-of-fabric, highest-pressure-link technology first: in 2026 it is roughly 0.5% of AI-data-center optical modules, concentrated exactly where the power wall is hardest, and it spreads outward only as the ecosystem matures.
| Placement | Per-800G power | Electrical reach to optics | Serviceability | Supply chain | Where it fits |
|---|---|---|---|---|---|
| Pluggable + DSP (retimed) | ~14–17 W | ~10–30 cm PCB + connector | Hot-swap, seconds, multi-vendor | Mature, commoditized, second-source | Default today; scale-out at 400G/800G |
| LPO (linear, no DSP) | ~7–8.5 W | Same path, but no DSP retiming — relies on host SerDes | Hot-swap, but interop is host-dependent | Emerging; reach/interop limits | Short-reach links where host SerDes is clean |
| LRO (half-retimed) | Between LPO and DSP | DSP on TX only | Hot-swap | Emerging compromise | Bridge where full LPO interop is risky |
| CPO (on-package) | Under ~6 W (≈5.4 W Broadcom Bailly class) | ~millimeters on shared substrate | Not faceplate-swappable; FRU = optical sub-assembly / external laser | Single-vendor switch+optics; nascent | Top-of-fabric ≥102.4 T radix; 1.6T+ links |
The table is a power-versus-serviceability gradient. Moving down the rows trades watts for risk. A retimed pluggable is the safest operational choice — anyone can swap it, anyone can second-source it — and the most expensive in power. CPO is the inverse: it wins the power war decisively and loses the serviceability war, because the optics are now soldered into a system you cannot service with a screwdriver and a spare module from inventory. The two middle rows, LPO and LRO, exist precisely because operators want the power saving without the full serviceability tax — they strip the DSP (or half of it) while keeping the faceplate form factor. They are the hedge, not the destination.
The serviceability tax — the real cost of CPO
The reason CPO did not arrive years earlier, despite the power math always favoring it, is a single operational fact: lasers fail, and faceplate pluggables are designed to be replaced when they do. A failed pluggable in a live fabric is a two-minute hot-swap by a technician who never touches the switch. A failed optical engine soldered onto a switch package is, naively, a switch replacement — pull the whole line card, lose the ports, re-cable, re-acceptance-test. That is operationally unacceptable at scale, and it is the reason CPO's adoption is gated by serviceability engineering, not by photonics.
The industry's answer is to keep the highest-failure-rate component — the laser — off the package and at the front panel. The External Laser Small Form-factor Pluggable (ELSFP) approach mounts the lasers in front-panel modules that run at controlled temperature and remain hot-swappable, feeding light into the on-package optical engines over fiber. The laser, statistically the part most likely to die, stays a field-replaceable unit; the passive on-package modulators and detectors, which fail far less often, stay soldered. NVIDIA's Quantum-X Photonics goes further with a detachable optical sub-assembly: a three-engine OSA that can be disconnected from the package and replaced as a field-replaceable unit, restoring something close to the pluggable repair model without the pluggable power. The consequence the operator inherits: a CPO fabric's spare-parts strategy, RMA flow, and technician training are different from a pluggable fabric's, and the optical engine is single-vendor with the switch. You are not buying a module from a competitive market; you are buying a system.
Fiber plant: single-mode vs multimode, and the reach classes
Beneath the transceiver decision sits a more durable one: what glass you pull. Fiber outlives several generations of optics, so the fiber-plant choice is a structural decision that the density ramp must be designed against — get it wrong and the next speed step is a re-pull, not a transceiver swap. The fork is single-mode versus multimode.
Multimode (OM3/OM4/OM5) uses a wide core and cheap VCSEL transceivers, and it was the data-center default for a decade because short-reach links were cheaper to light. But multimode's modal-dispersion physics shrinks the supported reach as lane rates climb. OM4 supports 100G-SR4 to ~100 m but 800G-class SR8 only to tens of meters; OM5 with 200G VCSELs (not in volume as of late 2025) stretches that only modestly. Multimode is a depreciating asset: each speed generation shortens the distance it can carry, and at some point the link you need exceeds the reach the installed glass supports.
Single-mode (OS2) uses a narrow core and (historically more expensive) laser-based transceivers, but its reach is governed by attenuation and chromatic dispersion, not modal dispersion — so it carries 800G, 1.6T, and 3.2T over the same fiber. The structural consequence is decisive: OS2 trunked once for 800G today will carry 1.6T and 3.2T on transceiver upgrades alone, with no physical re-cabling. As transceiver-cost gaps narrow and DR/FR/LR single-mode optics commoditize, the industry consensus for AI new-builds has flipped: single-mode OS2 is now the recommended structural fiber for any plant that expects to ride the density ramp. You pay a little more per transceiver today to avoid pulling fiber twice across a 100k-GPU hall.
| Class | Fiber | Nominal reach | Typical use in an AI fabric |
|---|---|---|---|
| SR (multimode) | OM4 / OM5 | tens of m (shrinks with speed) | Legacy intra-rack / very short runs; declining for new AI builds |
| DR (single-mode) | OS2 | ~500 m | Intra-hall / row-to-row back-end fabric; the AI workhorse |
| FR (single-mode) | OS2 | ~2 km | Building-to-building within a campus |
| LR (single-mode) | OS2 | ~10 km | Campus spine, cross-building spine links |
| Coherent ZR/ZR+ | OS2 | ~tens to hundreds of km | Scale-across / DCI between campuses → see Chapter 8.8 |
The reach classes map cleanly onto the fabric hierarchy. Inside the rack, copper still wins where it can — passive DAC to ~1–2 m, active copper to a few meters (Chapter 8.9) — and the worst-case NVL72 in-rack span is short enough to keep copper viable for scale-up today. Beyond that, the moment a link leaves the rack for the back-end fabric it is single-mode DR over OS2 for the typical intra-hall run, FR/LR for campus spine, and coherent ZR/ZR+ for the inter-campus links that belong to scale-across (Chapter 8.8). The fiber plant is therefore a layered system: copper at the bottom for the cheapest, shortest, highest-volume links; single-mode glass everywhere a photon must travel more than a few meters.
Structured cabling at scale: MPO, base-N, and polarity
An 800G or 1.6T link is not one fiber; it is a parallel ribbon of them. An 800G-DR8 link runs 8 transmit and 8 receive lanes — 16 fibers — terminated in a multi-fiber push-on connector (MPO/MTP). At 100k-GPU scale the fabric is a sea of these multi-fiber connectors, and the way they are grouped into trunks, the polarity scheme that keeps transmit aligned to receive, and the connector count in each channel become the load-bearing engineering of the whole physical layer.
The base-N question is which fiber granularity the trunk is built around. Base-8 (MPO-8/MPO-12 wired as 8) matches the 8-lane structure of SR8/DR8 transceivers with no stranded fibers and no conversion cassette — a 400G/800G breakout maps directly onto the trunk. Base-12, the legacy default, leaves fibers stranded or requires conversion modules when feeding 8-lane optics. MPO-16 carries 16 fibers in one connector, matching an 800G-SR8/DR8 link end-to-end in a single ferrule, and at 200G/lane the same MPO-16 carries 1.6T — which is why MPO-16 has become the forward-looking choice for AI plants planning the 1.6T → 3.2T ramp. The consequence of getting base-N wrong is stranded fiber, mandatory conversion cassettes (each adding a mated pair of loss), and a trunk that does not breakout to match the transceiver — re-work measured in days across a hall.
Polarity — ensuring every transmit fiber lands on the far-end receive — is the silent source of acceptance failures. The TIA-defined Method A/B/C polarity schemes each require a disciplined, documented choice of trunk type, cassette type, and patch-cord type; mix two methods in one channel and links go dark in ways that are tedious to isolate at scale. This is the strongest argument for structured cabling over point-to-point: a structured plant fixes one polarity method, documents it, and uses keyed components so a 100k-link fabric is repeatable and auditable rather than a per-link debugging exercise.
Deep dive: the channel loss budget at 800G/1.6T — why a sloppy install fails acceptance
Every optical link has a power budget: the transmitter launches a certain power, the receiver needs a certain minimum, and the difference is the loss budget the channel must fit inside. At lower speeds that budget was generous and installs were forgiving. At 800G and 1.6T it is tight, and the install has almost no margin to waste.
The channel loss has three contributors: fiber attenuation (small for single-mode over intra-hall distances — a few tenths of a dB per km), connector/mated-pair loss (the dominant term — each MPO mated pair costs a fraction of a dB, and a structured channel with patch panels at both ends can stack four or more mated pairs), and splice loss where present. For an 800G multimode SR8 channel the budget is on the order of ~1.7–1.8 dB total — and a channel with several MPO mated pairs of even modestly out-of-spec loss eats that budget before the fiber attenuation is even counted. Single-mode DR8 has more headroom on attenuation but is still connector-loss-dominated at scale.
The operational consequences are concrete. First, connector count is a design variable: every patch panel you add for serviceability costs a mated pair of budget, so structured-cabling architects trade serviceability against loss explicitly. Second, cleanliness is not optional: a single contaminated ferrule can blow the entire budget, which is why AI-scale installs mandate inspect-and-clean on every connection and 100% channel testing (insertion loss, and increasingly OTDR) as an acceptance gate. Third, the loss budget is a real reason CPO and short-reach links are attractive — fewer, shorter, cleaner connections inside a tighter budget. A back-end fabric that fails its loss-budget acceptance test does not throw clean errors; it throws FEC-correctable errors that silently erode goodput until a collective stalls. The cabling acceptance test is, in effect, a goodput gate. Field execution and the install-velocity interface are engineered in Chapter 7.15.
Structured vs point-to-point: the install-velocity fork
The last decision is whether to cable structured or point-to-point, and it is really a decision about install velocity, fault isolation, and how the next density step gets executed. Point-to-point runs a jumper directly from one transceiver faceplate to another — fewer connectors (lower loss), but every move/add/change is a re-pull, fault isolation means tracing a specific jumper through a congested tray, and there is no patch field to re-patch a density change. Structured cabling lands trunks on patch panels (MDA/HDA cross-connects), with short equipment cords from panel to switch — more connectors (more loss budget consumed) in exchange for a documented, repeatable, auditable plant where a density step is a re-patch, a fault is isolated at a panel, and the install can be pre-terminated and pre-tested as factory trunks.
At AI scale the structured choice usually wins, for a reason specific to the density ramp: the trunk infrastructure (the expensive, slow-to-install, high-fiber-count backbone) is generation-agnostic single-mode glass that survives the 800G → 1.6T → 3.2T transitions untouched, while the cheap, fast equipment cords and transceivers turn over each generation. Structured cabling is how you decouple the irreversible substrate (the trunk plant) from the reversible fit-out (the optics) — the same reversible-versus-irreversible discipline that governs the slab and the power chain (Chapter 1.1). You pay the loss-budget and connector-count price up front to buy a plant you can re-patch instead of re-pull. The countervailing cost — every mated pair eats the tight 800G/1.6T budget — is exactly why the loss-budget discipline above is non-negotiable, and why hyperscale AI plants pre-terminate, factory-test, and inspect-and-clean every connection as an acceptance gate.