Guide › Part 1
Part 1
Strategy, Workload Archetypes & Economics
8 chapters
1.11.21.31.41.51.61.71.8
The Archetype Decision Framework: Workload Is the Master Variable
The workload you intend to run is the single upstream decision that deterministically sets density, cooling, fabric, redundancy, and siting — pick it wrong and you do not have an inefficient data center, you have the wrong one.
Training Data Centers: Synchronous, Dense, Checkpointable
A training data center is one synchronous supercomputer whose every step runs at the speed of its slowest GPU — so you design it to maximize goodput per megawatt, not availability, and you make the building dense, liquid-cooled, non-blocking, and checkpointable before you cut steel.
Inference Data Centers: Bursty, Distributed, Always-On
An inference data center is not a smaller training cluster — it is a different machine optimized for a different objective: many independent requests served against a latency SLO, always-on, close to users, with goodput-per-dollar and tokens-per-watt as the scoreboard rather than the speed of a single synchronous job.
Post-Training, Fine-Tuning & RL: The Hybrid Middle
Post-training is not a smaller training cluster — it is a fleet of inference engines feeding a comparatively tiny trainer, and the operator who scopes it as either pure training or pure inference strands capital on the half they got wrong.
Edge Inference & Distributed Micro-Datacenters
Edge inference inverts every default of a centralized AI build — you stop chasing the cheapest megawatt and start chasing the closest one to the user — and the single decision that governs whether the inversion pays for itself is the latency budget, not the GPU.
Procurement Archetypes: Build vs Buy vs Rent
How you acquire capacity is a bet on time, control, and the durability of your demand — and in a power-bound, fast-depreciating market the right answer is almost never the one that minimizes unit cost, because the expensive mistakes are made in the time and optionality dimensions, not the per-GPU-hour one.
The Requirements-and-Consequences Matrix
Once you have named the workload archetype, the rest of the facility is no longer a menu of options — it is a forced sequence of subsystem commitments, and this chapter is the lookup table that turns one requirement into a defensible, signed design basis.
Business Models, Economics & ROI
An AI data center is a depreciating capital asset whose return is decided by four numbers — capex per watt, the depreciation life you assume, the utilization you actually achieve, and the price you can still charge after the market deflates it — and getting any one of them wrong turns a 'factory' into a stranded balance-sheet liability.