The AI factories Nvidia describes today did not appear overnight. They are the product of nearly a decade of iterative system design, starting with the first purpose-built deep learning machines and evolving into rack-scale supercomputers for agentic AI. Tracing the path from the original DGX-1 to the upcoming Rubin platform shows how fast AI infrastructure has transformed — and why Nvidia now talks about tens of millions of times more AI computing capability than a decade ago.
DGX-1: The first deep learning supercomputer
On April 6, 2016, Nvidia introduced the DGX‑1, the first computer designed specifically for deep learning. It packed eight Pascal-based GPUs connected through the first generation of NVLink and delivered around 170 teraflops of compute. For its time, it was an audacious system: a single appliance positioned as the equivalent of hundreds of CPU servers for training neural networks.
DGX-1 was aimed primarily at researchers. It shipped with optimised deep learning frameworks and tools so that universities and labs could get to work without hand-assembling their own clusters. The core idea, though, was that deep learning needed systems built from the ground up, not general-purpose servers repurposed for GPU workloads. That idea has shaped everything Nvidia has done since.
Volta, NVLink switches and acting like one big GPU
As models grew larger and more complex, even eight GPUs in a box were not enough. With the Volta generation, Nvidia introduced the NVLink switch, allowing 16 GPUs to be connected with full all-to-all bandwidth and operate almost like one enormous GPU. This scale-up approach made it possible to train bigger models and run more demanding workloads within a single tightly coupled domain.
But the demand curve for AI did not flatten. Companies wanted to train across ever-larger datasets and parameter counts. That meant connecting not just 16 GPUs but dozens or hundreds of GPU nodes. The conclusion was clear: to keep up with model growth, the entire data centre had to behave as a single computer, not a loose cluster of unrelated servers.
Mellanox, SuperPODs and scale-out architectures
That requirement led to Nvidia’s acquisition of Mellanox Technologies, whose InfiniBand and Ethernet products became the backbone of scale-out AI systems. In 2020, Nvidia unveiled the DGX A100 SuperPOD, one of the first GPU supercomputing architectures to combine scale-up and scale-out in a coherent way.
Inside each node, NVLink connected GPUs for high-bandwidth scale-up. Across nodes, Mellanox networking — including HDR InfiniBand — provided the fabric for scale-out. Together, they allowed large clusters of GPUs to operate as unified AI systems, with high throughput both within and between boxes. The SuperPOD era made clear that the basic unit of AI computing was no longer a server but an entire rack or row of tightly integrated machines.
Hopper and the FP8 Transformer Engine
The next big architectural step came with the Hopper generation and the H100 GPU. Hopper introduced the FP8 Transformer Engine, a set of tensor cores and software that could run transformer models at reduced precision while preserving accuracy. That change dramatically accelerated the language models that underpin today’s generative AI wave.
Networking also advanced. Hopper systems used NVLink 4 inside nodes, ConnectX-7 NICs for high-speed networking and BlueField-3 DPUs to offload infrastructure tasks. Second-generation Quantum InfiniBand switches pushed more bandwidth across the cluster. Together, these pieces made Hopper platforms far more capable of running long-context, token-heavy transformer workloads at scale.
Blackwell and the NVLink 72 system
Nvidia’s Blackwell architecture redefined what an AI supercomputer could look like. In the NVLink 72 configuration, 72 Blackwell GPUs are connected through fifth-generation NVLink, delivering on the order of 130 terabytes per second of all-to-all bandwidth within a single performance domain. From the software’s point of view, that rack behaves much like one gigantic accelerator.
Blackwell systems do not just bundle GPUs; they integrate Grace CPU processors, advanced NVLink switches, high-performance Ethernet platforms and orchestration software into end-to-end AI factories. The goal is straightforward: maximise token throughput per rack and per megawatt while keeping latency low enough for interactive and agentic workloads.
Rubin and systems built for agentic AI
The next architecture in this progression is the Nvidia Rubin platform, designed explicitly for every stage of agentic AI. Rubin-based systems are described as delivering around 3.6 exaflops of AI compute with roughly 260 terabytes per second of NVLink bandwidth across a 72-GPU performance domain. Where Blackwell focused on generative and reasoning workloads, Rubin is positioned as the infrastructure for long-horizon, tool-using AI agents.
The platform advances multiple pillars at once: new GPUs, the Vera CPU for orchestration and large-scale workflows, AI-optimised storage fronted by BlueField DPUs and high-performance Ethernet fabrics for scale-out. Additional accelerator systems and offload engines further increase token generation performance. When combined, these technologies can deliver over thirty times more throughput per megawatt compared with earlier generations, according to Nvidia’s framing.
From boxes to factories
Looked at over ten years, the trajectory is clear. Nvidia moved from selling a single deep learning box (DGX‑1) to selling entire AI factories: DGX SuperPODs, Blackwell NVLink 72 racks and Rubin-based Vera Rubin platforms. Each generation increased not just raw FLOPS but the ability to treat a whole data centre as one programmable machine for training, fine-tuning and, above all, high-volume inference.
Along the way, the company also leaned heavily into hardware–software co-design. CUDA, cuDNN, TensorRT-LLM, scheduling systems and deployment stacks have all been tuned to take advantage of each new hardware capability. The result is that effective AI computing capacity — measured in tokens generated, models trained or tasks completed — has increased by tens of millions of times over roughly a decade when both hardware and software gains are multiplied together.
Why this history matters now
For enterprises deciding how to invest in AI infrastructure, this history is more than a technical curiosity. It explains why Nvidia talks about AI factories and token factories at GTC instead of just GPUs. The company is selling a story in which the fundamental unit of computing is an integrated, power-constrained factory that turns data and electricity into tokens, and in which each new architecture — from DGX‑1 to Rubin — is another step in industrialising that process.
As agentic AI systems spread into more workflows and industries, the demand for these factories will only grow. The organisations that benefit most are likely to be those that understand what each generation of architecture enables, design their software to exploit it and secure enough capacity to keep their own token factories running at full tilt.
Sources
- Nvidia GTC keynotes and technical blogs on DGX‑1, Volta NVLink switches, DGX A100 SuperPOD, Hopper, Blackwell and Rubin architectures
- Public Nvidia documentation on NVLink bandwidth, exaflops-scale systems and AI factory design
- Industry reporting on the evolution of GPU supercomputing and data centres into AI factories