Guide › Part 9

Part 9

Storage & Data

9 chapters

Storage in the AI Lifecycle: Why It Determines GPU Efficiency

Storage is not the thing that holds your data — it is the thing that decides whether a $40,000 accelerator computes or idles, and the only way to design it right is to stop treating it as one system and recognize the four mutually-hostile I/O personalities competing for the same hardware.

Parallel & Distributed File Systems

The parallel file system is the throughput organ that keeps a $30-40k GPU saturated — and the decision that determines whether it succeeds is not peak sequential bandwidth, it is how the metadata plane survives billions of small files and how the namespace converges with object so the data path stops being a copy farm.

NVMe Tiers, GPUDirect Storage & the CPU-Bypass Data Path

The bottleneck that strands a $40k GPU is rarely the file system's aggregate bandwidth — it is the per-request tax of bouncing every byte through a host CPU and a bounce buffer, and the engineering answer in 2026 is to delete the CPU from the data path entirely.

Checkpointing for Large-Scale Training

Checkpointing is not a backup chore — it is the goodput control knob of a training cluster: the interval, the tier, and the bandwidth you size for it decide how many GPU-hours the next failure erases, and at frontier scale a failure is never far away.

Data Ingestion, Preprocessing & the Data-Loader Path

The data-loader path is the one storage subsystem that sits in the critical loop of every training step — get its format, sharding, and CPU budget wrong and you do not have a slow pipeline, you have idle GPUs burning depreciation while they wait to be fed.

Object Storage, Data Lakes & the Capacity Tier

Object storage is the gravitational floor of the AI data center — the one tier that holds the whole corpus, every checkpoint lineage, and every shipped model — and the decision that determines its cost is not which vendor you pick but whether you treat it as a cheap archive or a first-class, flash-fronted serving layer that the GPUs actually read from.

Inference & KV-Cache Storage: The New Memory Hierarchy

Inference turned the KV-cache into a first-class storage problem: the bytes a request must keep resident now spill far past HBM, and where you let them land — DRAM, CXL-expanded DRAM, NVMe, or Ethernet-attached flash — sets your tokens-per-second, your cost-per-token, and how many users one GPU can serve.

Sizing, Data Gravity & Resilience

Storage is sized to a per-GPU bandwidth budget, not a capacity number; gravity decides where the compute goes; and resilience is bought as goodput — get the ratio, the geography, or the isolation wrong and the cost shows up as idle accelerators on a depreciation clock you cannot stop.

The Data-Prep Supercomputer: Offline Data Processing

Before a single GPU sees a token, a second supercomputer — CPU-bound, storage-heavy, and almost always undersized — must dedupe, filter, decontaminate, and tokenize trillions of tokens; treat data prep as an afterthought and you either starve the training fleet of clean tokens or burn frontier-GPU hours doing string processing the wrong silicon should never touch.