From DGX-1 to Rubin: How Nvidia Turned Data Centres into AI Factories

Read Editorial Disclaimer

Disclaimer: Perspectives here reflect AI-POV and AI-assisted analysis, not any specific human author. Read full disclaimer — issues: report@theaipov.news

By Tech Desk | March 16, 2026 | 5 min read AI-Assisted | Source: YouTube / Nvidia

The AI factories Nvidia describes today did not appear overnight. They are the product of nearly a decade of iterative system design, starting with the first purpose-built deep learning machines and evolving into rack-scale supercomputers for agentic AI. Tracing the path from the original DGX-1 to the upcoming Rubin platform shows how fast AI infrastructure has transformed — and why Nvidia now talks about tens of millions of times more AI computing capability than a decade ago.

DGX-1: The first deep learning supercomputer

On April 6, 2016, Nvidia introduced the DGX‑1, the first computer designed specifically for deep learning. It packed eight Pascal-based GPUs connected through the first generation of NVLink and delivered around 170 teraflops of compute. For its time, it was an audacious system: a single appliance positioned as the equivalent of hundreds of CPU servers for training neural networks.

DGX-1 was aimed primarily at researchers. It shipped with optimised deep learning frameworks and tools so that universities and labs could get to work without hand-assembling their own clusters. The core idea, though, was that deep learning needed systems built from the ground up, not general-purpose servers repurposed for GPU workloads. That idea has shaped everything Nvidia has done since.

Volta, NVLink switches and acting like one big GPU

As models grew larger and more complex, even eight GPUs in a box were not enough. With the Volta generation, Nvidia introduced the NVLink switch, allowing 16 GPUs to be connected with full all-to-all bandwidth and operate almost like one enormous GPU. This scale-up approach made it possible to train bigger models and run more demanding workloads within a single tightly coupled domain.

But the demand curve for AI did not flatten. Companies wanted to train across ever-larger datasets and parameter counts. That meant connecting not just 16 GPUs but dozens or hundreds of GPU nodes. The conclusion was clear: to keep up with model growth, the entire data centre had to behave as a single computer, not a loose cluster of unrelated servers.

Mellanox, SuperPODs and scale-out architectures

That requirement led to Nvidia’s acquisition of Mellanox Technologies, whose InfiniBand and Ethernet products became the backbone of scale-out AI systems. In 2020, Nvidia unveiled the DGX A100 SuperPOD, one of the first GPU supercomputing architectures to combine scale-up and scale-out in a coherent way.

Inside each node, NVLink connected GPUs for high-bandwidth scale-up. Across nodes, Mellanox networking — including HDR InfiniBand — provided the fabric for scale-out. Together, they allowed large clusters of GPUs to operate as unified AI systems, with high throughput both within and between boxes. The SuperPOD era made clear that the basic unit of AI computing was no longer a server but an entire rack or row of tightly integrated machines.

Hopper and the FP8 Transformer Engine

The next big architectural step came with the Hopper generation and the H100 GPU. Hopper introduced the FP8 Transformer Engine, a set of tensor cores and software that could run transformer models at reduced precision while preserving accuracy. That change dramatically accelerated the language models that underpin today’s generative AI wave.

Networking also advanced. Hopper systems used NVLink 4 inside nodes, ConnectX-7 NICs for high-speed networking and BlueField-3 DPUs to offload infrastructure tasks. Second-generation Quantum InfiniBand switches pushed more bandwidth across the cluster. Together, these pieces made Hopper platforms far more capable of running long-context, token-heavy transformer workloads at scale.

Blackwell and the NVLink 72 system

Nvidia’s Blackwell architecture redefined what an AI supercomputer could look like. In the NVLink 72 configuration, 72 Blackwell GPUs are connected through fifth-generation NVLink, delivering on the order of 130 terabytes per second of all-to-all bandwidth within a single performance domain. From the software’s point of view, that rack behaves much like one gigantic accelerator.

Blackwell systems do not just bundle GPUs; they integrate Grace CPU processors, advanced NVLink switches, high-performance Ethernet platforms and orchestration software into end-to-end AI factories. The goal is straightforward: maximise token throughput per rack and per megawatt while keeping latency low enough for interactive and agentic workloads.

Rubin and systems built for agentic AI

The next architecture in this progression is the Nvidia Rubin platform, designed explicitly for every stage of agentic AI. Rubin-based systems are described as delivering around 3.6 exaflops of AI compute with roughly 260 terabytes per second of NVLink bandwidth across a 72-GPU performance domain. Where Blackwell focused on generative and reasoning workloads, Rubin is positioned as the infrastructure for long-horizon, tool-using AI agents.

The platform advances multiple pillars at once: new GPUs, the Vera CPU for orchestration and large-scale workflows, AI-optimised storage fronted by BlueField DPUs and high-performance Ethernet fabrics for scale-out. Additional accelerator systems and offload engines further increase token generation performance. When combined, these technologies can deliver over thirty times more throughput per megawatt compared with earlier generations, according to Nvidia’s framing.

From boxes to factories

Looked at over ten years, the trajectory is clear. Nvidia moved from selling a single deep learning box (DGX‑1) to selling entire AI factories: DGX SuperPODs, Blackwell NVLink 72 racks and Rubin-based Vera Rubin platforms. Each generation increased not just raw FLOPS but the ability to treat a whole data centre as one programmable machine for training, fine-tuning and, above all, high-volume inference.

Along the way, the company also leaned heavily into hardware–software co-design. CUDA, cuDNN, TensorRT-LLM, scheduling systems and deployment stacks have all been tuned to take advantage of each new hardware capability. The result is that effective AI computing capacity — measured in tokens generated, models trained or tasks completed — has increased by tens of millions of times over roughly a decade when both hardware and software gains are multiplied together.

Why this history matters now

For enterprises deciding how to invest in AI infrastructure, this history is more than a technical curiosity. It explains why Nvidia talks about AI factories and token factories at GTC instead of just GPUs. The company is selling a story in which the fundamental unit of computing is an integrated, power-constrained factory that turns data and electricity into tokens, and in which each new architecture — from DGX‑1 to Rubin — is another step in industrialising that process.

As agentic AI systems spread into more workflows and industries, the demand for these factories will only grow. The organisations that benefit most are likely to be those that understand what each generation of architecture enables, design their software to exploit it and secure enough capacity to keep their own token factories running at full tilt.

Sources

Nvidia GTC keynotes and technical blogs on DGX‑1, Volta NVLink switches, DGX A100 SuperPOD, Hopper, Blackwell and Rubin architectures
Public Nvidia documentation on NVLink bandwidth, exaflops-scale systems and AI factory design
Industry reporting on the evolution of GPU supercomputing and data centres into AI factories

Related Video

Related video — Watch on YouTube

Read More News

How To Build A Legal RAG App In Weaviate

AI YouTube Clones Are Turning Professor Jiang’s Viral Rise Into A Conspiracy Machine

The Iran Ceasefire Is Turning Into A Maritime Pressure Campaign

China’s Taiwan Carrot Still Depends On Military Pressure

Putin’s Easter Ceasefire Shows Why Russia Still Controls The Timing

OpenAI’s Cyber Defense Push Shows GPT-5.4 Is Arriving With Guardrails

Meta’s Muse Spark Makes Subagents The New Face Of Meta AI

Your Fingerprints Are Now Europe’s First Gatekeeper: How a Digital Border Quietly Seized Unprecedented Control

Meloni’s Crime Wave Panic: A January Stabbing Becomes April’s Political Opportunity

Germany’s Noon Price Cap Is Economic Surrender Dressed as Policy Innovation

Germany’s Quiet Healthcare Revolution: How Free Lung Cancer Screening Reveals What’s Really Broken

France’s Buried Confession: Why Naming America as an Election Threat Really Means

The State as Digital Parent: Why the UK’s Teen Social Media Ban Is Actually Totalitarian

Starmer’s Crypto Ban Is Political Theater Hiding a Completely Different Story

Spain’s €5 Billion Emergency Response Will Delay Economic Pain, Not Prevent It

The Spanish Soldier Detention Reveals the EU’s Fractured Israel Strategy

Anthropic’s Mythos Reveals the Truth: AI Labs Now Possess Models That Exceed Human Capability

Polymarket’s Pattern of Suspiciously Timed Bets Reveals Systemic Information Asymmetry

Beyond Nostalgia: How Japan’s Article 9 Debate Reveals a Civilization Under Existential Pressure

Japan’s Oil Panic Exposes the Myth of Wealthy Nation Invulnerability

Brazil’s 2026 Rematch: The Election That Will Determine If Latin America Surrenders to the Left

Brazil’s Lithium Trap: How the Energy Transition Boom Could Destroy the Region’s Future

Australia’s Iran Refusal: A Sovereign Challenge to American Hegemony That Will Cost It Dearly

Artemis II’s Historic Return: The Moon Mission That Should Be Celebrated but Reveals Space’s True Purpose

Why the Netherlands’ Tesla FSD Approval Is a Regulatory Trap for Europe

The Dutch Government’s Shareholder Revolt Could Reshape Executive Compensation Across Europe

Poland’s Economic Success Cannot Prevent the Rise of Polexit and European Fragmentation

The Poland-South Korea Defense Partnership Is Quietly Reshaping European Security Architecture

North Korea’s Missile Tests Are Reactive—The Real Escalation Is Seoul’s Preemption Strategy

Samsung’s Record Earnings Are Real, But the Profits Vanish When You Understand the Costs

Turkey’s Radical Tobacco Ban Could Kill an Industry—But First It Will Consolidate Power

Turkey’s Balancing Act Is Breaking: Fitch Downgrade Reveals Currency Collapse Risk

Milei’s Libertarian Experiment Is Unraveling: Approval Hits Historic Low

Mexico’s Last Fossil Fuel Bet: Saguaro LNG Would Transform Mexico’s Energy Future—If It Survives Politics

Mexico’s World Cup Dream Meets Security Nightmare: 100,000 Troops Cannot Prevent Cartel War Bloodshed