Guide › Project Delivery, Schedule, Procurement, Contracts & Risk › 2.1

Chapter 2.1

Program & Project Management: The Integrated Master Schedule & Critical Path

An AI data center is not built on the critical path the construction industry knows — it is built on a power-and-silicon critical path where a single transformer slot or interconnection date can strand a billion dollars of GPUs, so the schedule, not the design, is the asset you are actually managing.

POWER-BOUNDDENSITY-RAMPGOODPUT

What you'll decide here

Which single milestone you are managing the whole program toward — time-to-first-train (or first-token) — and therefore which of the parallel tracks (power, building, IT) you treat as the governing critical path versus the ones you keep off it with float.
Whether you order the long-lead items (HV transformers, GSUs, switchgear, turbines, the GPU allocation) on a P50 schedule or a P90 schedule — because the gap between those two dates is measured in quarters of revenue, and the deposit goes out before the design is frozen.
How you run the facility track and the cluster track as two schedules that must be bound by explicit integration milestones — the powered-shell handoff, energization, water-on, and the burn-in gate — rather than one monolithic Gantt that hides the seam where most slip happens.
Which project-controls discipline (earned value, milestone-deposit cash curve, change-order and claims process) you stand up on day one, because owner controls retrofitted onto a hot project become a forensic exercise, not a steering tool.
What your stage-gate governance actually gates — which irreversible commitments (the interconnection deposit, the transformer PO, the GPU slot reservation) are released at which board approval, and where the assumptions-and-decisions register records what you bet and who owns the bet.

Three tracks at different speeds; the ~128-week transformer, not the building or the GPUs, sets the date.

Part 1 decided what to build and whether the economics close. This chapter is where the abstraction ends and the calendar begins. An AI data center is a program with a deadline that is set not by the owner's ambition but by physics and supply chains: the day the cluster can take its first synchronous training step, or serve its first revenue token. Everything upstream of that day is a race, and everything about how you run the race is a sequence of decisions whose consequences are denominated in time. Because the asset depreciates on a 2–3 year economic clock, time converts directly into money. → the depreciation clock that prices every lost month is in Chapter 1.8.

This chapter applies that frame to schedule. We lay out the phase-gate lifecycle and reframe the build as a time-to-first-train race; we construct the Integrated Master Schedule (IMS) and locate the critical path across three tracks that move at different speeds; we quantify schedule risk with Monte Carlo and the P50/P90 dates the long poles force on you; we install the owner's project controls — earned value, milestone deposits, change orders and claims; we bind the facility and cluster schedules with integration milestones; and we close on the stage-gate governance and the assumptions/decisions register that records what the program is actually betting. The recurring theme: in a power-bound, allocation-constrained market, the schedule is the project, and the long poles are not the ones a traditional general contractor watches.

The lifecycle and the phase-gate model

A data-center program moves through a recognizable sequence — scope and design basis → site control and entitlement → interconnection and power → procurement → construction → commissioning → go-live → operations — and the mature way to govern it is a phase-gate (stage-gate) model: each phase ends in a gate where capital is released, assumptions are tested, and the program either advances, holds, or kills. The point of the gate is not ceremony. It is to make the irreversible commitments explicit and to put a named owner and a dated decision on each one before the money leaves. → the reversible-vs-irreversible discipline this inherits is set in Chapter 1.1.

What makes the AI build different from a 2018 enterprise data center is that the gates are no longer evenly spaced. In a power-bound market the early gates — interconnection and long-lead procurement — release the commitments that set the finish date, while the late gates (fit-out, commissioning) govern execution against a clock that was effectively fixed eighteen months earlier. The construction industry's instinct is to gate on design maturity; the AI program's reality is that you must gate on power certainty and allocation certainty long before the design is mature, or you arrive at a finished building with no megawatts and no GPUs. The phase-gate model has to be re-weighted accordingly: front-load the gates that release time-critical deposits, and accept that you are committing capital against assumptions you have not yet fully retired.

The governing milestone: time-to-first-train (or first-token)

Pick the single milestone the entire program is steered toward and make every track subordinate to it. For a training-shaped facility that is time-to-first-train: the date a synchronous job can run across the full fabric at goodput. For an inference-shaped facility it is time-to-first-token at SLA. This milestone is not the same as "construction complete" or "ready-for-service" — those are facility milestones, and the cluster needs weeks of burn-in, fabric validation, and a reference run after the building is done before it earns anything. Owners who manage to RFS instead of to first-train routinely discover a six-to-ten-week cluster-bring-up tail they never put on the schedule. The revenue clock starts at first useful work, not at the certificate of occupancy. → the bring-up tail is engineered in Chapter 13.10.

Building the Integrated Master Schedule across three tracks

The Integrated Master Schedule is the single time-logic network that ties every deliverable, dependency, and milestone into one critical-path-method (CPM) model. The mistake that defines failed AI programs is running it as one undifferentiated Gantt. An AI data center is really three schedules braided together, each governed by a different physics and a different supplier ecosystem, each with its own critical path:

The power track — interconnection studies and agreements, the utility's grid upgrades, the substation, HV/GSU and medium-voltage transformers, switchgear, and (increasingly) on-site or behind-the-meter generation as a bridge. This track is dominated by lead times the owner cannot compress: large power transformers at roughly 128 weeks and generator step-up units at ~144 weeks (Wood Mackenzie Q2 2025 survey), and large-load grid interconnection at ~3–7+ years end-to-end. It is almost always the governing critical path.
The building track — entitlement and permits (the air permit is a recurring long pole where on-site gas is involved), earthworks, shell, mechanical/electrical/plumbing, and the cooling plant. A shell-and-core AI hall can be built in 12–18 months — fast relative to the power track, which is exactly why building is rarely the binding constraint.
The IT / cluster track — the GPU allocation (a slot, not a purchase, negotiated quarters ahead), CoWoS/HBM-gated accelerator delivery, network fabric, storage, structured cabling, then rack-and-stack, fabric validation, burn-in, and the reference run. This track is gated by allocation, not by the owner's cash. → the allocation game lives in Chapter 2.3; the HBM constraint behind it in Chapter 7.6.

The IMS exists to expose the float between these tracks and the integration milestones where they must meet. Float is the schedule's shock absorber: the building track usually carries weeks-to-months of float against the power track, and the discipline is to spend that float deliberately — sequencing the fit-out to land just-in-time against energization — rather than letting it evaporate into early-but-idle completion. The cardinal sin is letting the slowest long pole (a transformer) consume all the float silently while the team celebrates the building track finishing early on a slab that has no power.

The three tracks: critical path, long poles, and float behavior

Track	Governs	Typical long pole(s)	Indicative duration	Float vs the program critical path
Power	Megawatts at the rack, on a firm date	Interconnection (3–7+ yr); HV/GSU transformers (~128–144 wk); HV switchgear (45–80 wk)	3–7+ years to firm grid power; 18–36 mo for a BTM-gas bridge	Usually zero — this IS the critical path
Building	A weather-tight, plumbed, code-compliant hall	Air permit (where on-site gas); cooling plant; long-span steel	12–18 months shell-to-MEP-complete	Positive — finishes ahead; spend the float just-in-time to energization
IT / cluster	A validated cluster doing useful work	GPU allocation slot; CoWoS/HBM-gated delivery; the fabric	Allocation negotiated 2–4 quarters ahead; 6–10 wk bring-up after install	Bounded by the powered-shell handoff; the bring-up tail is often un-scheduled

Lead times are 2025–2026 practitioner ranges (Wood Mackenzie Q2 2025 transformer survey; Build.inc; SemiAnalysis; ISO/RTO filings). Durations are indicative; every site differs.

The table is a sequencing problem, not an inventory. The power track sets the date; the building track must finish into that date with just enough float to absorb a slipped transformer; the cluster track cannot start meaningful integration until the powered-shell handoff, and then carries a bring-up tail that the inexperienced owner forgets to schedule. The IMS's whole job is to make those three truths visible at once so that effort and capital flow to whichever track is currently binding — which, in 2026, is almost always power.

Schedule risk analysis: Monte Carlo, P50/P90, and the long poles

A deterministic CPM schedule produces a single finish date, and that date is a fiction — it is the result you get only if every activity lands on its point estimate, which collectively never happens. The mature program runs a quantitative schedule risk analysis (QSRA): assign a duration distribution (typically three-point — optimistic/most-likely/pessimistic) to each activity, model the correlations (a transformer delay and a switchgear delay are not independent — they share a strained supply chain), and run a Monte Carlo over the network a few thousand times. The output is not a date but a distribution, and the two numbers that matter are the P50 (the date you have a coin-flip chance of beating) and the P90 (the date you are 90% confident of beating).

The gap between P50 and P90 is dominated by a handful of long poles with long right-tails: the HV/GSU transformer, the grid interconnection energization date, the air permit where on-site generation is in scope, and the GPU/HBM allocation. These are not normally distributed — they are long-right-tailed, because the failure modes (a transformer factory slot slips a quarter, an interconnection study restudy adds a year, an air-permit challenge adds eighteen months) move the date a lot, not a little. A schedule whose P50–P90 spread is six months is telling you that one of these poles can eat two quarters of revenue, and the deposit on that pole goes out the door before the design is frozen.

Order to the P90, schedule the team to the P50

The most expensive scheduling error in the 2026 build-out is buying the long poles on the P50 date. If you place the transformer PO and the interconnection deposit timed to the median schedule, you have a coin-flip chance of arriving with a finished hall and no power — and the asset that strands is not the transformer, it is the GPU fleet that depreciates at 2–3 years whether or not it is energized. The rule: commit the long-lead, hard-to-expedite items to the P90 date (order early, hold buffer, reserve slots), and run the controllable construction/fit-out work to the P50 so the cheap-to-accelerate tracks can flex to meet whichever pole lands last. Of the ~12 GW of US capacity targeted for 2026, only about one-third was actively under construction by early 2026, with the rest exposed to multi-quarter slippage (industry tracking, 2026) — that is the population-level signature of programs that ordered to the P50. → interconnection-queue mechanics in Chapter 3.2; the long-lead register in Chapter 2.3.

~128 wk

large power transformer lead time (~144 wk GSU); up to ~5 yr in constrained markets — the schedule-dominating long pole

2025Wood Mackenzie Q2 2025 survey / pv magazine

3–7+ yr

large-load grid interconnection, application to energization; up to ~10 yr in the worst queues

2025ERCOT / PJM filings synthesis

12–18 mo

AI data-center shell-to-MEP-complete construction — fast vs the power track, so rarely the binding constraint

2026Archdesk / Mastt build-lifecycle guides

~1/3

of the ~12 GW US capacity targeted for 2026 actively under construction by early 2026; the rest exposed to slippage

2026Industry construction tracking

10–14 wk

Level-5 integrated systems testing for a liquid-cooled AI hall (vs 4–6 wk air-cooled) — the un-compressible commissioning tail

2026Construct & Commission / 2026 outlook synthesis

~1 failure / 512 GPUs / week

best-in-class fleet failure rate after burn-in; new clusters fail far more for the first 3–4 weeks — the bring-up tail

2025SemiAnalysis (100k H100 clusters)

~$10–12B

annual revenue per GW of AI capacity — so ~200 MW landing 6 months early is worth ~$1–1.2B; the schedule's dollar value (contested — single-source)

2025SemiAnalysis (onsite gas economics)

20%

non-refundable interconnection study deposit common in PJM-scale queues — capital committed before the design is frozen

2025PJM queue synthesis

Owner's project controls: earned value, deposits, change and claims

A schedule you cannot measure against is a wish. Project controls is the owner-side discipline that turns the IMS into a steering instrument: a cost-and-schedule baseline, periodic measurement of progress against it, and a forecast that updates honestly. The backbone is earned value management (EVM) — comparing the budgeted cost of work performed (BCWP/EV) against the budgeted cost of work scheduled (BCWS/PV) and the actual cost (ACWP/AC), to derive a schedule performance index (SPI) and cost performance index (CPI). The value of EVM on an AI build is not the acronyms; it is that it forces physical-percent-complete discipline and produces an estimate-at-completion early enough to act on, instead of a surprise at the end.

But EVM was built for labor-and-materials projects, and an AI data center's cost is dominated by a few enormous milestone-deposit equipment orders — the transformer, the switchgear, the turbines, the GPU allocation — paid against vendor manufacturing milestones, not against installed progress. This breaks naive EVM: booking the full PO value as "earned" on deposit overstates progress; booking nothing until delivery understates it for two years. The owner's controls function has to track a commitment/cash curve alongside the EVM curve — when each deposit is contractually due, what it secures (a factory slot, a queue position), and what its forfeiture costs if the program pivots. On AI builds the deposit schedule, not the construction draw, is the dominant near-term cash event. → deposit and slot-reservation instruments in Chapter 2.3; the contract that governs them in Chapter 2.4.

Change-order and claims management is the other half. AI programs change scope mid-flight more than any other large construction class — a GPU-generation jump (NVL72 to a denser successor) mid-design re-rates the cooling plant, the floor loading, and the busway; an interconnection re-study moves the energization date and cascades into the fit-out sequence. Each change is a fork with a schedule and cost consequence, and the owner who has not stood up a disciplined change-control board on day one ends up litigating those consequences as claims at the end. The cheap move is a tight baseline plus a fast, well-documented change process; the expensive move is a loose baseline that turns every density surprise into a dispute.

Owner's controls: the steering instruments and what they catch

Instrument	What it measures	What it catches early	AI-specific twist
Earned value (SPI/CPI)	Performed vs scheduled vs actual cost	Slip and overrun, via a real estimate-at-completion	Distorted by milestone-deposit equipment — needs physical-% rigor
Commitment / cash curve	When each deposit is due and what it secures	Forfeiture exposure if the program pivots	Deposits (transformer, GPU slot) dwarf the construction draw early
Critical-path & float report	Which track is binding; float remaining	Float being silently consumed by a long pole	Three braided tracks — must report per-track, not one number
Change-control board	Scope deltas, priced with schedule impact	Density/generation pivots before they become claims	GPU-gen jumps re-rate cooling/floor/power mid-design
Risk register & QSRA refresh	P50/P90 movement as risks retire or fire	A long pole's tail materializing	Long poles are correlated — model them jointly

The owner-side project-controls stack for an AI build. EVM indices follow standard AACE/PMI definitions; the deposit-curve overlay is the AI-specific addition.

The facility-vs-cluster two-track schedule and its integration milestones

The single most under-managed seam in an AI build is the boundary between the facility (the powered, cooled shell, delivered by the construction and MEP world) and the cluster (the GPUs, fabric, and software, delivered by the IT and platform world). These are two organizations, two cultures, two schedules, and two definitions of "done" — and the project lives or dies in how cleanly they are bound. The right structure is an explicit two-track schedule with a small set of named integration milestones where the tracks hand off, each with an unambiguous entry/exit gate and an owner. → the powered-shell delivery model that creates this seam is in Chapter 2.2.

The integration milestones that bind the two tracks, in order:

Powered-shell handoff. The facility delivers a hall with conditioned space, structural floor capacity, and the power and cooling distribution stubbed to the white space — but not yet energized to the rack. This is the contractual seam between base-building and IT fit-out, and the cleanest place to split scope and risk.
Energization (power-on). Medium-voltage power live to the in-row PDUs/busway, UPS and any on-site generation commissioned (L3/L4). Until this gate the cluster track cannot draw load; it is the most common place for the power track's slip to surface as a cluster-track delay. → electrical acceptance in Chapter 13.3.
Water-on / cooling-ready. The facility cooling loop and CDUs flushed, leak-checked, balanced, and proven to spec — non-negotiable before energizing liquid-cooled racks, because a coolant inlet out of spec throttles the GPUs up to 50%. → CDU commissioning in Chapter 13.5.
Integrated systems test (L5 IST). The facility proves it holds load and rides through faults under simulated full IT load. For a liquid-cooled AI hall this runs 10–14 weeks, against 4–6 for air — hydraulic balancing and staged thermal load tests across thousands of connections cannot be compressed. → IST in Chapter 13.6.
Cluster burn-in and the reference run. Now the IT track owns the clock: node diagnostics, fabric BER validation, burn-in (new clusters fail far more for the first 3–4 weeks), and a reference training/inference run at goodput. This is first-train. → burn-in in Chapter 13.8; cluster-scale validation in Chapter 13.9.

The reason to make these milestones explicit rather than implicit is that the seam is where finger-pointing lives. When the building is "done" but the cluster is not earning, the question is always whose milestone slipped — and a program with named integration gates and per-gate owners answers it in a stand-up, while a program with one Gantt answers it in a claim.

Deep dive: why the cluster bring-up tail is the schedule everyone forgets

Construction-world schedules end at ready-for-service. AI revenue does not start there — it starts at first useful work, and the gap between the two is a cluster bring-up tail that is routinely missing from the owner's IMS. The tail has hard, un-compressible content. After racks are powered and water flows, the fabric must be validated (an InfiniBand bit-error-rate sweep against a ~1e-12 threshold, per-port, across tens of thousands of links), nodes must be diagnosed and the inevitable dead-on-arrival GPUs and HBM swapped, and the cluster must burn in: new clusters fail far more than mature ones for the first 3–4 weeks, and a single failed GPU restarts a synchronous job from its last checkpoint. Only after the fleet settles toward the best-in-class failure rate (~1 failure per 512 GPUs per week) does a reference run demonstrate goodput.

The consequence of omitting this tail is a 6–10-week phantom delay between "building done" and "cluster earning" that the owner did not budget — six to ten weeks during which the GPU fleet depreciates and earns nothing. On a 200 MW hall at ~$10–12B/GW/yr, that tail is on the order of $200–500M of foregone revenue if it is a surprise instead of a plan. The fix is structural: put burn-in and the reference run on the IMS as critical-path activities, staff them, and manage time-to-first-train as the finish line — not ready-for-service. → the goodput target that defines a successful bring-up is in Chapter 13.9; the checkpoint math behind training's restart cost in Chapter 9.4.

Stage-gate governance, board approvals, and the assumptions register

The phase-gate model only protects the program if the gates actually gate something irreversible. The governance question is therefore concrete: at which board approval is each one-way-door commitment released? The interconnection-study deposit (often 20% and non-refundable in a PJM-scale queue) is committed before any building exists; the HV transformer PO commits a factory slot 128 weeks out; the GPU allocation reservation commits a slot quarters ahead of silicon that is itself CoWoS/HBM-gated. Each of these is capital released against assumptions that have not been fully retired — which is exactly why the gate exists: to force the board to look at the assumption, name its owner, and accept the bet on the record.

The artifact that makes this auditable is the assumptions-and-decisions register — the schedule-and-commercial analogue of the design-basis document from scoping. It records, for every load-bearing assumption (the energization date, the transformer delivery date, the GPU-generation the cooling plant is sized for, the contracted-vs-merchant power split the financing assumes), what was assumed, who owns it, when it must be confirmed or it becomes a risk, and which downstream commitments depend on it. When a long pole's tail fires — a transformer slips a quarter — the register is what tells you, in minutes, which downstream dates and deposits move and who has to be told. Without it, the same event becomes a forensic reconstruction conducted under deposition.

Govern the schedule like a balance sheet of bets

The most useful mental shift for an AI-program steering committee is to stop treating the schedule as a plan and start treating it as a portfolio of dated bets, each with an owner, a confirmation deadline, and a downstream blast radius. The transformer date is a bet. The interconnection energization date is a bet. The GPU-generation the hall is plumbed for is a bet. The contracted-power split the lenders underwrote is a bet. The assumptions register is the ledger of those bets; the stage gates are where you decide which to hedge (order to P90, buffer the slot, plumb for the next density) and which to ride (the reversible ones). A program run this way slips gracefully — it knows which bet failed and what moves — while a program run as a single Gantt discovers its failures all at once, at the seam, as a claim.

This chapter is the program-management spine for all of Part 2. The delivery model that creates the facility-vs-cluster seam — EPC vs design-build vs powered-shell-plus-fit-out, and the owner's-representative/commissioning-agent roles — is in Chapter 2.2. The long-lead register that feeds the power and IT tracks (transformers, switchgear, turbines, GPU/HBM allocation) and slot-reservation contracting are in Chapter 2.3; the upstream HBM constraint behind GPU allocation is in Chapter 7.6. The contract stack that prices schedule risk — liquidated damages, milestone deposits, interconnection agreements — is in Chapter 2.4, and the project-finance draw/deposit mechanics in Chapter 2.5; schedule-delay insurance (builder's risk, delay-in-startup) in Chapter 2.6. The interconnection long pole is engineered in Chapter 3.2 and the energy-supply bridge in Chapter 3.4. The integration milestones map directly onto the commissioning program: fundamentals in Chapter 13.1, electrical acceptance in Chapter 13.3, cooling in Chapter 13.5, L5 IST in Chapter 13.6, burn-in in Chapter 13.8, and go-live/handover in Chapter 13.10. The dollar value of every saved month traces back to the ROI clock in Chapter 1.8; the dated forecast register that the assumptions register binds to is Appendix D.