Guide › Part 7
Part 7
Compute, Silicon & System Integration
15 chapters
7.17.27.37.47.57.67.77.87.97.107.117.127.137.147.15
Accelerator Landscape & Taxonomy
The accelerator is not a product you buy — it is an architectural family you marry, and the four families (merchant GPU, systolic TPU, hyperscaler XPU, inference ASIC) impose different software stacks, different scale-up fabrics, and different lock-in horizons long after the silicon is racked.
NVIDIA Accelerators: Hopper → Blackwell → Vera Rubin → Rubin Ultra → Feynman
NVIDIA's accelerator roadmap is not a spec sheet you read but a power-and-density ramp you are forced to design against — each annual generation moves the unit of purchase from the chip to the rack to the multi-rack pod, and committing to the wrong rung sets your cooling plant, power architecture, and refresh economics for years.
AMD Instinct & the Open Challenger
AMD's hardware caught NVIDIA on memory and FLOPS and, with MI400/Helios, on rack-scale fabric — so the second-source decision is no longer about silicon, it is about whether your workload can pay the ROCm-maturity tax and whether you believe an open scale-up stack will be production-proven before your cluster depreciates.
Hyperscaler XPUs: TPU, Trainium/Inferentia, Maia, MTIA
When the company that owns the model also owns the silicon, the accelerator stops being a product you buy and becomes a cost structure you rent into — and the real decision is no longer FLOPS-per-dollar but whether you can tolerate a software stack and a supply chain you do not control.
Custom ASICs & the Merchant-Silicon Disruption
Custom silicon is not a technology decision, it is a volume-and-flexibility bet: above a sustained-demand threshold the per-token economics of a fixed-function ASIC crush a merchant GPU, but below it you have spent hundreds of millions in NRE and 18–36 months of lead time to ship a chip your roadmap already obsoleted.
HBM: The Binding Constraint on AI Compute
An accelerator is a memory system with some math attached — and in 2026 the math is cheap, the memory is sold out, so HBM, not the GPU die, is the line that decides how many chips ship and how much each one costs.
Advanced Packaging & the Integration Substrate
The accelerator you can buy is not set by how fast a fab can print logic — it is set by how large an interposer a packaging house can yield, because the package is the substrate on which compute, memory, and bandwidth are physically integrated, and in 2026 it is the most-cited binding constraint on AI compute through the end of the decade.
Host CPUs, GPU:CPU Ratios & System Composition
The host CPU and the GPU:CPU ratio are not bookkeeping details bolted onto an accelerator purchase — they are the system-composition fork that decides whether your GPUs stay fed, and in the agentic era that fork has swung hard back toward the CPU.
Software Ecosystems & Lock-In
The accelerator you buy is also a software contract: CUDA, ROCm, XLA, and Neuron are not interchangeable runtimes but distinct lock-in regimes, and the price of switching is paid not in the datasheet FLOPS you compared but in the realized-MFU gap you discover after the cluster is live.
Precision, Quantization & the Compute-Memory Tradeoff
Precision is the cheapest performance lever in the building and the most dangerous: every step down the ladder roughly doubles tensor-core throughput and halves the memory footprint, but it spends accuracy headroom you cannot always get back — so the engineering question is never 'how low can we go' but 'how low, with which scaling scheme, before this specific workload crosses its quality floor.'
Accelerator Selection, TCO & Procurement Strategy
Accelerator selection is not a spec-sheet beauty contest — it is a constrained optimization against whichever resource actually binds you (power or capital), scored in cost-per-useful-token, and executed through a procurement playbook that treats allocation, depreciation, and fleet heterogeneity as first-class engineering variables.
On-Package Power Delivery & Power Integrity
The last meter of the power chain — 48V on the board down to ~0.7V at 2,000+ amps inside the package — is where the AI accelerator either gets the clean, low-impedance, fast-responding current it needs to hit clock, or it droops, throttles, and turns purchased megawatts into wasted goodput; and the same di/dt event that defines this meter is the seed of the facility-scale transient three layers up.
The Rack as Integration Unit
The rack stopped being a sheet-metal cabinet that holds servers and became the unit you buy, ship, power, cool, cable, and certify as one object — so every other physical subsystem now lands on a rack standard, and choosing the wrong standard strands the floor, the busbar, the manifold, and the cabling all at once.
Server & System Integration
The rack does not arrive — it is integrated, and the level at which it is integrated (DGX appliance, HGX-OEM, ODM-direct, or OCP self-design) decides who owns the factory burn-in, who owns the acceptance gate, who owns the RMA, and ultimately how many days of stranded goodput sit between a powered shell and a producing cluster.
Deployment Velocity & Cabling at Scale
Time-to-goodput is set on the floor, not in the design package: the rate at which racks land and links light — and the discipline that keeps mis-cabling off the acceptance critical path — is the velocity metric that converts an energized shell into a productive cluster, and pre-terminated cabling plus off-line optics screening are the two levers that move it most.