Skip to content

From DGX-1 to Rubin: How Nvidia Turned Data Centres into AI Factories

Read Editorial Disclaimer
Disclaimer: Perspectives here reflect AI-POV and AI-assisted analysis, not any specific human author. Read full disclaimer — issues: report@theaipov.news

The AI factories Nvidia describes today did not appear overnight. They are the product of nearly a decade of iterative system design, starting with the first purpose-built deep learning machines and evolving into rack-scale supercomputers for agentic AI. Tracing the path from the original DGX-1 to the upcoming Rubin platform shows how fast AI infrastructure has transformed — and why Nvidia now talks about tens of millions of times more AI computing capability than a decade ago.

DGX-1: The first deep learning supercomputer

On April 6, 2016, Nvidia introduced the DGX‑1, the first computer designed specifically for deep learning. It packed eight Pascal-based GPUs connected through the first generation of NVLink and delivered around 170 teraflops of compute. For its time, it was an audacious system: a single appliance positioned as the equivalent of hundreds of CPU servers for training neural networks.

DGX-1 was aimed primarily at researchers. It shipped with optimised deep learning frameworks and tools so that universities and labs could get to work without hand-assembling their own clusters. The core idea, though, was that deep learning needed systems built from the ground up, not general-purpose servers repurposed for GPU workloads. That idea has shaped everything Nvidia has done since.

Volta, NVLink switches and acting like one big GPU

As models grew larger and more complex, even eight GPUs in a box were not enough. With the Volta generation, Nvidia introduced the NVLink switch, allowing 16 GPUs to be connected with full all-to-all bandwidth and operate almost like one enormous GPU. This scale-up approach made it possible to train bigger models and run more demanding workloads within a single tightly coupled domain.

But the demand curve for AI did not flatten. Companies wanted to train across ever-larger datasets and parameter counts. That meant connecting not just 16 GPUs but dozens or hundreds of GPU nodes. The conclusion was clear: to keep up with model growth, the entire data centre had to behave as a single computer, not a loose cluster of unrelated servers.

Mellanox, SuperPODs and scale-out architectures

That requirement led to Nvidia’s acquisition of Mellanox Technologies, whose InfiniBand and Ethernet products became the backbone of scale-out AI systems. In 2020, Nvidia unveiled the DGX A100 SuperPOD, one of the first GPU supercomputing architectures to combine scale-up and scale-out in a coherent way.

Inside each node, NVLink connected GPUs for high-bandwidth scale-up. Across nodes, Mellanox networking — including HDR InfiniBand — provided the fabric for scale-out. Together, they allowed large clusters of GPUs to operate as unified AI systems, with high throughput both within and between boxes. The SuperPOD era made clear that the basic unit of AI computing was no longer a server but an entire rack or row of tightly integrated machines.

Hopper and the FP8 Transformer Engine

The next big architectural step came with the Hopper generation and the H100 GPU. Hopper introduced the FP8 Transformer Engine, a set of tensor cores and software that could run transformer models at reduced precision while preserving accuracy. That change dramatically accelerated the language models that underpin today’s generative AI wave.

Networking also advanced. Hopper systems used NVLink 4 inside nodes, ConnectX-7 NICs for high-speed networking and BlueField-3 DPUs to offload infrastructure tasks. Second-generation Quantum InfiniBand switches pushed more bandwidth across the cluster. Together, these pieces made Hopper platforms far more capable of running long-context, token-heavy transformer workloads at scale.

Blackwell and the NVLink 72 system

Nvidia’s Blackwell architecture redefined what an AI supercomputer could look like. In the NVLink 72 configuration, 72 Blackwell GPUs are connected through fifth-generation NVLink, delivering on the order of 130 terabytes per second of all-to-all bandwidth within a single performance domain. From the software’s point of view, that rack behaves much like one gigantic accelerator.

Blackwell systems do not just bundle GPUs; they integrate Grace CPU processors, advanced NVLink switches, high-performance Ethernet platforms and orchestration software into end-to-end AI factories. The goal is straightforward: maximise token throughput per rack and per megawatt while keeping latency low enough for interactive and agentic workloads.

Rubin and systems built for agentic AI

The next architecture in this progression is the Nvidia Rubin platform, designed explicitly for every stage of agentic AI. Rubin-based systems are described as delivering around 3.6 exaflops of AI compute with roughly 260 terabytes per second of NVLink bandwidth across a 72-GPU performance domain. Where Blackwell focused on generative and reasoning workloads, Rubin is positioned as the infrastructure for long-horizon, tool-using AI agents.

The platform advances multiple pillars at once: new GPUs, the Vera CPU for orchestration and large-scale workflows, AI-optimised storage fronted by BlueField DPUs and high-performance Ethernet fabrics for scale-out. Additional accelerator systems and offload engines further increase token generation performance. When combined, these technologies can deliver over thirty times more throughput per megawatt compared with earlier generations, according to Nvidia’s framing.

From boxes to factories

Looked at over ten years, the trajectory is clear. Nvidia moved from selling a single deep learning box (DGX‑1) to selling entire AI factories: DGX SuperPODs, Blackwell NVLink 72 racks and Rubin-based Vera Rubin platforms. Each generation increased not just raw FLOPS but the ability to treat a whole data centre as one programmable machine for training, fine-tuning and, above all, high-volume inference.

Along the way, the company also leaned heavily into hardware–software co-design. CUDA, cuDNN, TensorRT-LLM, scheduling systems and deployment stacks have all been tuned to take advantage of each new hardware capability. The result is that effective AI computing capacity — measured in tokens generated, models trained or tasks completed — has increased by tens of millions of times over roughly a decade when both hardware and software gains are multiplied together.

Why this history matters now

For enterprises deciding how to invest in AI infrastructure, this history is more than a technical curiosity. It explains why Nvidia talks about AI factories and token factories at GTC instead of just GPUs. The company is selling a story in which the fundamental unit of computing is an integrated, power-constrained factory that turns data and electricity into tokens, and in which each new architecture — from DGX‑1 to Rubin — is another step in industrialising that process.

As agentic AI systems spread into more workflows and industries, the demand for these factories will only grow. The organisations that benefit most are likely to be those that understand what each generation of architecture enables, design their software to exploit it and secure enough capacity to keep their own token factories running at full tilt.

Sources

  • Nvidia GTC keynotes and technical blogs on DGX‑1, Volta NVLink switches, DGX A100 SuperPOD, Hopper, Blackwell and Rubin architectures
  • Public Nvidia documentation on NVLink bandwidth, exaflops-scale systems and AI factory design
  • Industry reporting on the evolution of GPU supercomputing and data centres into AI factories

Related Video

Related video — Watch on YouTube
Read More News
Mar 16

New Zealand’s petrol pain is really a subsidy war between drivers and EV buyers

Mar 16

Closing the Kennedy Center is really a warning shot at Washington’s arts class

Mar 16

What the Kennedy Center fight reveals about who really controls U.S. culture funding

Mar 16

Vanity Fair’s Oscar party turns awards night into a celebrity brand marketplace

Mar 16

Copyright lawsuits against OpenAI are really about who owns the language we use

Mar 16

GTC 2026 will reveal how far behind the rest of Big Tech is on AI infrastructure

Mar 16

Nvidia is using GTC 2026 to lock AI developers into its ecosystem for a decade

Mar 16

Trump’s threats over Iranian oil routes signal a larger election-year energy gamble

Mar 16

U.S. voters will feel the Hormuz crisis at the pump long before the battlefield

Mar 16

Why Grace Blackwell and Rubin Multiply Revenue Capacity Across Every Token Tier

Mar 16

How Nvidia and Groq LP300 Plus Dynamo Unlock 35× on the Highest-Value Inference Tier

Mar 16

Inside Vera Rubin Ultra: Liquid-Cooled Racks for the Next Generation of AI Factories

Mar 16

How Token Pricing Tiers Will Reshape the AI Economy

Mar 16

Inside the AI Token Factory: Why Tokens Became the New Commodity of Computing

Mar 16

“This Is the Beginning of Something Very, Very Big”: Nvidia’s Jensen Huang on AI-Native Companies

Mar 16

From Retrieval to Generation: How ChatGPT Marked the Start of Nvidia’s Generative AI Era

Mar 16

From Perception to Agentic AI: How Reasoning and Coding Agents Changed the Game

Mar 16

The Inference Inflection Point: Why AI Computing Demand Grew a Million Times in Two Years

Mar 16

Healthcare Enters Its ‘ChatGPT Moment’ on Nvidia’s Accelerated Platform

Mar 16

Inside the Trillion-Dollar Industries Powering Nvidia’s AI Infrastructure Boom

Mar 16

Jensen Huang Explains Why Nvidia Is ‘Vertically Integrated but Horizontally Open’

Mar 16

Nvidia, Palantir and Dell Team Up on Air-Gapped AI Platforms

Mar 16

Nvidia CEO Jensen Huang Maps Out the AI Cloud Future in Live Keynote

Mar 16

Team USA’s Route to the Gold Medal Game Says More About the Field Than the Score

Mar 16

Jessie Buckley and the Oscars Narrative Ireland Wants to Tell

Mar 16

Winter Storm Wisconsin Updates: What We Know So Far

Mar 16

Why Iran Chose This Moment to Escalate the Strait of Hormuz Crisis

Mar 16

What the Oscars 2026 Winners Mean for Streaming Services and Theater Chains

Mar 16

The Last Time Oil Hit $100 During a Middle East Crisis, Recession Followed Within Months

Mar 16

Why Matchday Prep Stories Like Real Sociedad’s Rain Session Get Pushed as News

Mar 16

Trump’s Oil Infrastructure Threat Signals a Shift Away From Diplomatic Containment

Mar 16

Intuit’s Buyback Gambit Shows How AI Panic Is Warping Wall Street

Mar 16

Gas Prices Over $100 Per Barrel Will Force Fed to Choose Between Inflation Control and Economic Growth

Mar 16

Severe Weather Sunday and Monday: What We Know So Far

Mar 16

Why Meteorologists Keep Calling It the ‘Last’ Cold Front