Guide › Site Selection, Power Procurement & Permitting › 3.6

Chapter 3.6

Fiber, Latency & Network Connectivity (Secondary Screen)

Fiber is a secondary screen for a training campus and a primary one for an inference fleet — but for both, the binding number is the same physics: roughly 5 microseconds of one-way delay per kilometer of glass, which decides whether two campuses can train as one machine and whether a metro can serve a sub-50 ms request, and the diligence that decides whether your power-and-land-perfect site is in fact functionally stranded.

GOODPUTPOWER-BOUNDDENSITY-RAMP

What you'll decide here

Whether this site is training-shaped (fiber is a secondary screen — diversity and a long-haul path to peers matter, raw proximity does not) or inference-shaped (fiber is a primary screen — the latency budget to users is a pass/fail gate).
The inter-campus latency and bandwidth budget your distributed-training topology can absorb — and therefore the maximum great-circle distance between halls that still trains as one synchronous machine versus one that forces async/hierarchical methods.
Whether to buy lit waves, lease dark fiber (IRU), or build new conduit — a fork that trades time-to-light against control, cost, and a multi-year permitting tail on any greenfield path.
The route-diversity standard you will not compromise on: how many physically diverse paths, how many strands per path, and the documented separation distance that makes the second path real rather than a shared-conduit illusion.
Which connectivity facts must be verified before the site clears diligence (carrier count, conduit ownership, IXP/landing-station distance, middle-mile gaps) versus which can be engineered later — because half of candidate sites now fail fiber diligence, not power.

Network connectivity sits oddly in the 2026 siting hierarchy. For a frontier training campus, power and land come first by a wide margin (→ Chapter 3.1); fiber is a secondary screen — necessary, diligenced hard, but rarely the reason a site wins or loses. For an inference fleet chasing a latency SLO, the ranking inverts: the distance from your slab to your users, measured in milliseconds, is the primary screen, ahead of power cost. Same physical asset, opposite weight, because the workload changed (→ Chapter 1.1). This chapter treats fiber as exactly that: a screen whose weight you set from the archetype, then diligence to a standard that catches the failure modes before they strand a site.

We start from the one number that governs everything — the propagation delay of light in glass — and walk it forward into three forks: can two campuses train as one machine (the distributed-training latency wall), can a metro serve a sub-50 ms request (inference proximity), and how do you actually procure the path (lit vs dark vs new-build, and the middle-mile/make-ready/permitting tail that decides whether your power-perfect site is reachable at all). The recurring 2026 lesson is that fiber has quietly become a deal-killer: pre-deal route diligence now eliminates up to half of candidate sites in many markets — not for lack of power or land, but because they cannot meet the route-diversity and delivery-timeline bar AI-grade tenants demand (Global Data Center Hub, 2025).

The one number: 5 microseconds per kilometer

Light in a vacuum covers a kilometer in 3.34 microseconds. Light in single-mode fiber is slower — the glass core has a refractive index near 1.47, so the signal propagates at roughly two-thirds of c. The industry rule of thumb is ~4.9 microseconds of one-way delay per kilometer of fiber, rounded to 5 us/km, or ~10 us per kilometer round-trip (M2 Optics; MapYourTech, 2025). This is not an engineering tunable. You cannot buy it down with better optics, a faster switch ASIC, or a fatter pipe. It is set by the speed of light and the index of the glass, and it is the floor under every latency budget in this chapter.

Three consequences fall straight out of that floor. First, geography is latency: 1,000 km of fiber adds ~10 ms round-trip before a single packet is switched, queued, or serialized. Second, the route, not the great-circle distance, is what you pay for — fiber follows rights-of-way (rail, road, pipeline), so a 600 km straight line is often 800–900 km of glass, and the extra ~50% of distance is extra latency you did not plan for. Third, the floor is unforgiving in both directions of the workload: a synchronous training collective that crosses 100 km of fiber eats ~1 ms per hop into its step time, and an inference request that crosses 1,500 km of intercontinental haul eats ~15 ms of its ~50 ms budget before the model has done any work. Hold the 5 us/km number in mind for the rest of the chapter — every fork below is a different way of spending against it.

Switching adds microseconds; distance adds milliseconds

Inside a campus, the latency you fight is in the switches and the NICs — InfiniBand at ~1–2 us, tuned RoCEv2 at ~1.5–2.5 us per hop (SemiAnalysis / NVIDIA, 2025; see keynumbers). Those are microseconds, and they are the right unit for the scale-up and scale-out fabrics covered in Chapter 8.4. The moment a link leaves the campus, the unit changes by three orders of magnitude: a 50 km metro hop is ~250 us one-way of pure propagation, a 1,000 km long-haul is ~5 ms. When you cross the fence, distance dominates and nothing you do to the equipment closes the gap. That regime change — microseconds inside, milliseconds outside — is why inter-campus and inter-region connectivity is a fundamentally different design problem from intra-campus fabric.

Fork 1 — Can two campuses train as one machine?

The defining infrastructure story of 2025–2026 is that the largest training runs outgrew a single campus's power envelope. When you cannot energize a gigawatt under one roof, you split the run across campuses and stitch them together with fiber — which turns the 5 us/km floor into a hard design constraint. Google ran Gemini Ultra across multiple data centers, and by 2026 was wiring four campuses around Omaha/Council Bluffs into a single GW-scale training cluster with massive private fiber lines so they function as one supercomputer (DCD; SemiAnalysis, 2025). This is the GOODPUT thread made physical: every microsecond of inter-site delay that lands on the critical path of a synchronous gradient exchange is goodput you do not get back.

The fork is how tightly coupled the cross-site training is, and it sorts into three regimes by tolerance for inter-site delay and bandwidth. Synchronous data-parallel across sites exchanges full gradients every step; it is the most demanding, wants terabits of aggregate inter-site bandwidth (private DWDM links), and degrades fast past a few tens of kilometers because the all-reduce now spans the WAN. Hierarchical / async SGD (DiLoCo and its successors) synchronizes intermittently — every tens or hundreds of steps — and exchanges 100–500x less data, which is what makes a multi-hundred-kilometer or cross-region split viable at all (Google DeepMind DiLoCo; Epoch AI, 2025). Fully decentralized over the public internet is the loosest regime, trading convergence efficiency for the ability to pool stranded capacity anywhere. The engineering of these methods lives in Chapter 8.8; here the siting consequence is what matters: the training method you choose sets the maximum distance between your halls, and the distance you can secure between candidate sites bounds the methods you can run.

Inter-campus distance regimes for distributed training

Regime	Route distance	One-way propagation floor	Training method that fits	Siting consequence
Single campus / adjacent halls	under ~2 km	under ~10 us	Full synchronous DP/TP/PP — one fabric	Treat as one machine; fiber is intra-campus structured cabling
Metro multi-campus	~2–80 km	~10–400 us	Synchronous DP across sites with terabit private DWDM	Power-pool a metro; demands diverse dark fiber + DWDM build
Regional (intra-grid)	~80–500 km	~0.4–2.5 ms	Hierarchical / streaming DiLoCo (intermittent sync)	Stitch campuses on one interconnection footprint; async required
Cross-region / cross-grid	~500–3,000+ km	~2.5–15+ ms	Async / decentralized; gradient-compressed sync	Only loosely-coupled methods survive; convergence cost is the toll

One-way propagation is ~5 us per route-km; route distance typically runs ~1.3–1.5x the great-circle distance. RTT figures are propagation-only floors before switching, queuing, and serialization. Method tolerances are 2025–2026 practitioner ranges (SemiAnalysis; Google DeepMind DiLoCo; Epoch AI).

The distances in the table are a budget you spend against. If your power strategy forces a metro power-pool — four campuses inside an 80 km radius because that is where the megawatts are (POWER-BOUND) — you have implicitly committed to building or leasing diverse dark fiber between them and lighting it with DWDM to get the terabits a synchronous run needs. If the only stranded gigawatts are 400 km apart, you have implicitly committed to an asynchronous training method and accepted its convergence penalty. The fiber decision and the power decision are not independent screens you score separately; they constrain each other, and the campus topology that results is the joint solution. The cross-region case is where the toll bites hardest: past ~500 km the 5 us/km floor alone puts you above the delay where synchronous all-reduce stays efficient, so you are paying for either gradient compression overhead or slower convergence — a real cost in GPU-hours, not a free lunch.

Fork 2 — Can this metro serve the request in time?

For inference, the screen inverts. The user is waiting, and the experience is governed by an end-to-end budget: network RTT + queuing + the model's own compute + return RTT. The perceptual anchors are well established — responses under ~100 ms feel instantaneous, and direct-manipulation interactions (drag, voice turn-taking) want the network round-trip well under that so the budget is left for the model (HumAI; practitioner benchmarks, 2025). The standard target for latency-sensitive serving is sub-50 ms end-to-end, which after you reserve time for compute leaves only single-digit-to-low-tens of milliseconds for the network — and at 5 us/km, that is a hard radius around the user. A 30 ms model call wrapped in 30 ms of metro round-trip is a 60 ms feature; the same call wrapped in 200 ms of intercontinental round-trip plus a cold start is a 530 ms feature that fails the SLO regardless of how fast the GPU is.

The siting consequence is that an inference fleet is a geography problem before it is a power problem (→ Chapter 1.3, Chapter 1.5). You site in or near the metros where your users are, accept power that may cost 2–4x the stranded-rural rate, and let the latency budget — not the energy bill — pick the county. The fork inside the fork is centralized regional vs. distributed metro/edge: a few large regional sites minimize cost and operational sprawl but put a floor under tail latency for distant users; a denser mesh of metro/edge nodes hits the budget everywhere but multiplies operational overhead and strands capacity at low utilization. Most operators land on a hybrid — large regional cores for the bulk of traffic, a thin edge tier for the latency-critical slice — and the line between them is drawn by the SLO, not by instinct.

Set the fiber screen's weight from the archetype, before you score sites

The most common siting error in connectivity is applying the wrong weight. A team scoping a training campus that over-indexes on metro proximity throws away cheap-power sites for a latency advantage the workload does not value — pre-training is indifferent to user proximity. A team scoping an inference fleet that treats fiber as a secondary checkbox ships a fleet that misses its SLO in half the markets it serves, then pays to retrofit edge nodes it should have sited from the start. Decide which side of the archetype fork you are on, set the fiber screen's weight accordingly (secondary for training, primary for inference), and only then run the scoring matrix in Chapter 3.13. The weight is the decision; the score is downstream of it.

Fork 3 — Lit waves, dark fiber, or new build

Once the archetype has set the screen's weight and the topology has set the distances, the procurement question is concrete: how do you actually acquire the path? The fork has three positions, trading time-to-light against control and unit cost. Lit services — buying managed wavelengths or Ethernet circuits from a carrier — are the fastest to turn up and the least capital-intensive, but you inherit the carrier's routing, latency, and capacity ceiling, and you pay a recurring premium for capacity you do not control. Dark fiber (IRU) — leasing unlit strands on a long-term indefeasible-right-of-use and lighting them with your own DWDM — gives full control over bandwidth, routing, and the latency path, which is why hyperscalers and infrastructure investors have moved decisively toward owning or co-investing in dark assets for AI-grade links (Landgate; DCConnect, 2025). New build — putting your own conduit in the ground — gives total control and a future-proof strand count, at the price of the longest lead time and the full weight of permitting and make-ready.

The downstream cost that decides this fork is almost always time, not money. A lit circuit lands in weeks. A dark IRU lands in months once the route exists. A new build lands in years if it has to cross a middle-mile gap, because new conduit drags the entire permitting and make-ready tail behind it — and that tail can dominate the project schedule (DENSITY-RAMP: the ramp the fiber must serve arrives on a fixed clock). For an AI campus, the right answer is frequently a blend: lit capacity for day-one connectivity and management/out-of-band traffic, dark fiber owned or IRU'd on the diverse routes that carry the heavy east-west and DCI load, and new build reserved for the one critical gap no existing route fills. The structured-cabling and DWDM/OTN engineering that lights these paths is detailed in Chapter 8.10; this chapter's job is to get the right strands to the right place on the right schedule.

Connectivity procurement fork — lit vs dark vs new-build

Option	Time-to-light	Control over route & latency	Cost profile	Best fit
Lit waves / managed circuits	Weeks	Low — carrier owns routing & ceiling	Opex; recurring premium	Day-one connectivity, OOB/management, bridge capacity
Dark fiber — IRU lease	Months (if route exists)	High — you light it, you pick the path	Capex on optics + long-term IRU	Heavy DCI / east-west; latency-critical diverse routes
Dark fiber — owned	Months–years	Maximal — own the asset	Highest capex; strategic	Hyperscale-class, durable, multi-decade footprint
New conduit build	Years (permitting-bound)	Total — bespoke route & strand count	~$150k/mile + make-ready + permits	Closing a middle-mile gap with no diverse alternative

Time-to-light and the $/mile figure are 2025 practitioner ranges (Global Data Center Hub; Landgate; Netrality). Dual diverse new-build routinely runs ~$150k per mile to construct, before the permitting/make-ready tail.

Route diligence: diversity, ownership, and the diversity illusion

Whatever the procurement choice, the diligence standard is the same, and it is stricter in 2026 than most underwriting models assume. Sites without at least two physically diverse paths, each with sufficient strand count, are increasingly treated as functionally stranded regardless of how good the power and land are — and the bar practitioners quote is on the order of 24–48 fiber pairs per path with documented physical route separation (Global Data Center Hub, 2025). The reason fiber became a deal-killer is that a single path is a single point of failure: one backhoe, one rail-corridor fire, one severed conduit, and a campus that cost billions and waited years for power goes dark. Diversity is not a luxury tier; it is the entry ticket.

The trap that catches careful teams is the diversity illusion — two circuits sold as diverse that share a conduit, a bridge crossing, a building entrance, or a single piece of make-ready for part of their run. On a map they look redundant; in the ground they fail together. Real diligence requires the carrier to produce the physical route maps and prove a minimum separation distance along the entire path, including the last hundred meters into the building (diverse entrance facilities on opposite sides of the structure). Conduit ownership is the second buried fact: leasing capacity on someone else's conduit means your latency, your upgrade path, and your repair SLA are all someone else's decisions. Hyperscalers own or co-invest in conduit precisely to remove that dependency. The diligence checklist is short but unforgiving: how many physically diverse paths, who owns each conduit, what is the documented separation, how many strands are dark and available for growth, and where are the shared-risk segments that quietly couple the 'diverse' routes.

Deep dive: the middle-mile gap and the make-ready tail that strands a site

The most expensive connectivity surprise is not the absence of long-haul fiber — it is the middle-mile gap: the stretch between the long-haul backbone (which probably runs along the nearest interstate or rail line) and your slab out in the cheap-power exurb. The backbone is rich; the last 10–40 km to a rural campus often is not. Closing that gap means new conduit, and new conduit means make-ready — the slow, jurisdiction-by-jurisdiction work of preparing utility poles, getting attachment agreements, securing rights-of-way and easements, boring under roads and railways, and crossing water. Make-ready is governed by pole owners and local permitting offices, not by your construction schedule, and it routinely runs months to years. Dual diverse routes compound it: you are doing the whole permitting dance twice, on separated alignments, and dual diverse new-build runs on the order of $150k per mile to construct before the permitting cost (Global Data Center Hub; Netrality, 2025).

This is why fiber is now eliminating up to half of candidate sites in some markets at the diligence stage. A site can be perfect on power, land, water, and tax, and still be functionally unreachable because the middle-mile build to bring AI-grade diverse fiber to it will not finish before the GPUs are obsolete. The mitigation is to run fiber diligence early and in parallel with the power and land screens — not after a site has cleared everything else — and to treat the make-ready timeline as a long pole on the integrated master schedule (→ Chapter 3.2 frames speed-to-power the same way). The state-backed open-access middle-mile networks now appearing (e.g. multi-thousand-mile wholesale dark-fiber builds pairing IRU terms with renewable-power siting) exist precisely to close this gap and unstrand otherwise-attractive exurban sites.

Proximity to IXPs, cable landings, and backbones

Beyond raw diverse fiber, three categories of network anchor sharpen the screen — and their weight, again, is set by the archetype. Internet exchange points (IXPs) and carrier-neutral facilities are where networks peer; proximity to a major IXP shortens the path to the broader internet and to enterprise/eyeball networks, which matters enormously for an inference fleet serving the public and very little for a training campus that talks mostly to its own peer campuses over private fiber. Cable landing stations — where subsea cables make landfall — are the gateways for intercontinental traffic; a site near a landing station has a shorter, lower-latency path to other continents, which is decisive for sovereign or multi-region inference and for cross-region distributed training, and a non-issue for a single-region training run. Nordic and Iberian campuses lean on subsea landings exactly this way (→ Chapter 3.13). Long-haul backbone proximity is the one anchor that matters to everyone, because it is what closes the middle-mile gap cheaply: a site that sits on or near an existing backbone route can reach diverse fiber and DWDM/OTN capacity without years of new build.

The decision here is about which anchor you optimize for. Optimize for IXP proximity and you bias toward established metros — where power is expensive and constrained. Optimize for backbone proximity and stranded power and you bias toward exurban sites on a fiber corridor — the sweet spot for a training campus. Optimize for landing-station proximity and you bias toward coastal and specific subsea-rich geographies, accepting their power and land realities. There is no site that maximizes all three; the archetype tells you which to weight, and the others become diligence checkboxes rather than tiebreakers.

~4.9 us/km

one-way propagation delay in single-mode fiber (index ~1.47); ~5 us/km rule of thumb, ~10 us/km round-trip

2025M2 Optics; MapYourTech (5-microsecond rule)

~10 ms

round-trip propagation floor added by 1,000 km of fiber, before any switching or queuing

2025M2 Optics; fiber-latency physics

~1–2.5 us

per-hop fabric latency: InfiniBand ~1–2 us, tuned RoCEv2 ~1.5–2.5 us (microseconds, vs milliseconds for WAN)

2025SemiAnalysis / NVIDIA

100–500x

less inter-site data exchanged by DiLoCo-class intermittent sync vs full synchronous DP — what makes cross-region training viable

2025Google DeepMind DiLoCo; Epoch AI

sub-50 ms

end-to-end target for latency-sensitive inference; ~100 ms is the human 'instantaneous' threshold

2026HumAI; practitioner edge-inference benchmarks

~50%

of candidate sites eliminated at the fiber-diligence stage in many markets — for route diversity, not power

2025Global Data Center Hub (fiber-as-bottleneck)

~$150k/mile

construction cost for dual diverse new-build fiber routes, before the permitting and make-ready tail

2025Global Data Center Hub; Netrality

24–48 pairs

fiber pairs per path, on physically diverse routes, below which AI-grade sites are viewed as functionally stranded

2025Global Data Center Hub

The connectivity screen, as a sequence

Pulling the forks together, the connectivity screen is a short ordered sequence, and the order matters because each step gates the next. (1) Set the weight from the archetype — secondary for training, primary for inference — before any site is scored. (2) Derive the latency and bandwidth budget from the topology: the inter-campus distance the training method can absorb, or the metro radius the inference SLO allows. (3) Diligence the diverse paths early, in parallel with power and land, demanding physical route maps, conduit ownership, separation distance, and strand counts — and hunting the diversity illusion. (4) Pick the procurement fork — lit, dark, or new build — against the time-to-light the schedule can tolerate, treating any middle-mile make-ready as a long pole. (5) Score the network anchors — backbone, IXP, landing station — by the weight the archetype assigned, as tiebreakers rather than gates.

Done in that order, fiber rarely surprises a project. Done out of order — scoring anchors before setting the weight, or running fiber diligence only after a site has cleared everything else — it becomes the thing that strands a power-perfect campus or ships an inference fleet that misses its SLO. The screen is cheap to run early and very expensive to discover late.

The archetype that sets this screen's weight is established in Chapter 1.1, with the inference and edge proximity drivers in Chapter 1.3 and Chapter 1.5. This chapter is a siting screen — its place in the reordered hierarchy is set in Chapter 3.1, its parallel-with-power timing mirrors Chapter 3.2, and its scoring feeds the playbook in Chapter 3.13. The engineering this strategy hands off to lives in Part 8: the cross-campus distributed-training fabric and async/hierarchical methods in Chapter 8.8, the scale-out transport and protocols in Chapter 8.4, and the DWDM/OTN, fiber plant, and structured cabling that lights every path in Chapter 8.10. The goodput logic behind why inter-site microseconds matter is in Chapter 9.4 (checkpointing) and the geographic-failover view in Chapter 12.3.