Skip to content

Why Grace Blackwell and Rubin Multiply Revenue Capacity Across Every Token Tier

Read Editorial Disclaimer
Disclaimer: Perspectives here reflect AI-POV and AI-assisted analysis, not any specific human author. Read full disclaimer — issues: report@theaipov.news

Token pricing works like any product business: the higher the tier, the higher the quality and performance, but the lower the volume and capacity. That pattern exists in every industry. What Nvidia has done with the Grace Blackwell architecture is increase the performance of these tiers by 35× and introduce an entirely new tier. That represents a major jump compared with the previous Hopper generation.

At every tier the company increased throughput, and in the most valuable tier — the one with the highest average selling price — it increased performance by 10×. Achieving that is extremely difficult. It comes from technologies such as NVLink 72, extremely low-latency interconnects, and deep hardware–software co-design. These advances allow the entire performance curve to shift upward.

How power gets allocated across tiers

From a customer perspective, imagine distributing the power of a data centre across service tiers. Suppose 25% of the available power runs a free tier, 25% supports a mid-tier service, 25% runs a high tier, and 25% powers a premium tier. A typical large AI data centre might have around one gigawatt of power capacity, so the operator decides how to allocate that power.

The free tier helps attract users, while the premium tier serves the highest-value customers. When you multiply the throughput improvements across all tiers, the result directly translates into revenue. In a simplified example, the Blackwell architecture can generate roughly five times more revenue capacity than earlier systems. The Rubin generation could deliver around five times more again. That is why deploying the Vera Rubin platform quickly becomes important: token costs decrease while throughput increases.

The throughput–latency trade-off

There is still a fundamental challenge. High throughput requires enormous floating-point compute performance, while low latency requires extremely high bandwidth. Computer systems struggle to deliver both at the same time because there is only so much physical space on a chip and in a system for compute units and memory bandwidth. Optimising for maximum throughput and optimising for minimum latency are often conflicting goals.

NVLink-based systems like Vera Rubin excel at high-throughput, batch-friendly workloads: they can process huge numbers of tokens across many users when latency per user is less critical. But if you extend the requirements further — say you want to generate 1,000 tokens per second instead of 400 tokens per second for a single stream — eventually NVLink-based systems reach their bandwidth limits. Pushing past that ceiling is where a different kind of processor becomes useful.

Why tier improvements matter for AI factories

For operators running a gigawatt-scale AI factory, the math is straightforward. If Blackwell delivers roughly five times more revenue capacity than the previous generation, and Rubin delivers another factor of about five, then two architecture cycles can multiply the revenue potential of the same power envelope by an order of magnitude. That does not mean every operator will capture that full gain — competition and pricing will determine how much flows to the bottom line — but it does mean that the factories with the latest stacks have a structural advantage.

Deploying Vera Rubin quickly is therefore not just a technical choice; it is an economic one. Earlier deployment means earlier access to lower token costs and higher throughput, which in turn supports more aggressive pricing, larger context windows or faster token speeds for premium customers. In a market where tokens are becoming a commodity and tiers are segmenting by price and performance, the factories that can offer the best curve — more throughput at every tier and a credible premium tier at the top — will capture a disproportionate share of high-value workloads.

What this means for the industry

The Grace Blackwell and Rubin story is a reminder that AI infrastructure is not a single product but a layered performance curve. Free tiers, mid tiers, high tiers and premium tiers each consume a slice of the same power budget. The architectures that shift that curve upward — 35× on tier performance, 10× on the highest-value tier, and roughly 5× revenue capacity per generation — are the ones that will define who can afford to run which services at scale. For Nvidia, that is the logic of betting so heavily on NVLink 72, co-design, and the rapid rollout of the Vera Rubin platform.

In short: the same gigawatt that used to support one curve of free-to-premium tiers now supports a steeper curve with higher throughput at every level and a new top tier that was not feasible before. That is why tier economics and hardware roadmaps are inseparable in the AI factory era. Operators who deploy Grace Blackwell and Vera Rubin first will see both lower cost per token and a stronger position in the premium segment where margins are highest.

Sources

  • Nvidia GTC keynote on Grace Blackwell and Rubin tier performance (35×, 10× on premium tier), power allocation across tiers, and revenue capacity (5× per generation)
  • Nvidia materials on NVLink 72, Vera Rubin deployment and AI factory economics
  • Industry analysis of throughput versus latency trade-offs in large-scale inference

Related Video

Related video — Watch on YouTube
Read More News
Mar 16

New Zealand’s petrol pain is really a subsidy war between drivers and EV buyers

Mar 16

Closing the Kennedy Center is really a warning shot at Washington’s arts class

Mar 16

What the Kennedy Center fight reveals about who really controls U.S. culture funding

Mar 16

Vanity Fair’s Oscar party turns awards night into a celebrity brand marketplace

Mar 16

Copyright lawsuits against OpenAI are really about who owns the language we use

Mar 16

GTC 2026 will reveal how far behind the rest of Big Tech is on AI infrastructure

Mar 16

Nvidia is using GTC 2026 to lock AI developers into its ecosystem for a decade

Mar 16

Trump’s threats over Iranian oil routes signal a larger election-year energy gamble

Mar 16

U.S. voters will feel the Hormuz crisis at the pump long before the battlefield

Mar 16

How Nvidia and Groq LP300 Plus Dynamo Unlock 35× on the Highest-Value Inference Tier

Mar 16

Inside Vera Rubin Ultra: Liquid-Cooled Racks for the Next Generation of AI Factories

Mar 16

How Token Pricing Tiers Will Reshape the AI Economy

Mar 16

Inside the AI Token Factory: Why Tokens Became the New Commodity of Computing

Mar 16

From DGX-1 to Rubin: How Nvidia Turned Data Centres into AI Factories

Mar 16

“This Is the Beginning of Something Very, Very Big”: Nvidia’s Jensen Huang on AI-Native Companies

Mar 16

From Retrieval to Generation: How ChatGPT Marked the Start of Nvidia’s Generative AI Era

Mar 16

From Perception to Agentic AI: How Reasoning and Coding Agents Changed the Game

Mar 16

The Inference Inflection Point: Why AI Computing Demand Grew a Million Times in Two Years

Mar 16

Healthcare Enters Its ‘ChatGPT Moment’ on Nvidia’s Accelerated Platform

Mar 16

Inside the Trillion-Dollar Industries Powering Nvidia’s AI Infrastructure Boom

Mar 16

Jensen Huang Explains Why Nvidia Is ‘Vertically Integrated but Horizontally Open’

Mar 16

Nvidia, Palantir and Dell Team Up on Air-Gapped AI Platforms

Mar 16

Nvidia CEO Jensen Huang Maps Out the AI Cloud Future in Live Keynote

Mar 16

Team USA’s Route to the Gold Medal Game Says More About the Field Than the Score

Mar 16

Jessie Buckley and the Oscars Narrative Ireland Wants to Tell

Mar 16

Winter Storm Wisconsin Updates: What We Know So Far

Mar 16

Why Iran Chose This Moment to Escalate the Strait of Hormuz Crisis

Mar 16

What the Oscars 2026 Winners Mean for Streaming Services and Theater Chains

Mar 16

The Last Time Oil Hit $100 During a Middle East Crisis, Recession Followed Within Months

Mar 16

Why Matchday Prep Stories Like Real Sociedad’s Rain Session Get Pushed as News

Mar 16

Trump’s Oil Infrastructure Threat Signals a Shift Away From Diplomatic Containment

Mar 16

Intuit’s Buyback Gambit Shows How AI Panic Is Warping Wall Street

Mar 16

Gas Prices Over $100 Per Barrel Will Force Fed to Choose Between Inflation Control and Economic Growth

Mar 16

Severe Weather Sunday and Monday: What We Know So Far

Mar 16

Why Meteorologists Keep Calling It the ‘Last’ Cold Front