As AI systems have turned into token factories, tokens themselves have started to look like a new kind of commodity. Nvidia CEO Jensen Huang’s keynote takes this idea seriously enough to sketch out an entire pricing curve for tokens, with different tiers based on throughput, speed, model size and context length. Under that framework, the future of AI economics looks much closer to a layered commodities market than a flat per-API-call fee.
The key premise is straightforward: power is the limiting factor in every large AI data centre. Within a given power envelope, an AI factory’s productivity is determined by two axes — throughput (total tokens produced over time) and token speed (how fast tokens can be generated for each user or workflow). Huang visualises this as a chart: throughput on the vertical axis, token rate on the horizontal axis, with different services occupying different regions of the plane.
Why throughput and token speed matter
Throughput captures the ability of an AI factory to handle aggregate demand: millions or billions of tokens per second across all users. Token speed captures the user experience: how quickly a given model can respond, how long it can “think” before replying and how high a context window it can maintain without timing out.
Improvements on either axis are not free. Faster token speeds for a single user might require dedicating more of the factory’s capacity to that stream, reducing aggregate throughput for other workloads. Longer context windows — from 100,000 tokens of input to millions of tokens — dramatically increase the amount of data each request processes and therefore the compute it consumes. The result is a menu of trade-offs that naturally leads to pricing tiers.
From free tiers to $3 and $6 per million tokens
At the base of this market, Huang suggests, will be high-throughput but lower-speed services that power free or ad-supported tiers. These offerings prioritise aggregate tokens over per-user latency: they might run on slightly smaller models, use more aggressive batching or be tuned for background tasks where immediate responses are less critical.
Above that sit paid tiers priced around $3 per million tokens and then $6 per million tokens. These services could offer faster responses, higher-quality models or longer context than the free tier, but still operate at scales where cost per token must be tightly managed. For many developers and enterprises, these midrange tiers will be the default: good enough performance, predictable pricing and broad availability across regions.
Premium tiers at $45 to $50 per million tokens
At the top of the curve, Huang sketches out premium tiers priced around $45 per million tokens or even $50 per million tokens. These are not intended for casual chatbots. They are aimed at workloads that demand extreme performance, very long research workflows or both — think complex agentic systems that run multi-step reasoning chains, keep vast context windows alive and integrate with numerous tools and data sources.
In that regime, customers are effectively renting slices of top-tier AI factories with the fastest token speeds, highest throughput per user and most generous context budgets. The absolute token price is high, but so is the value of each run. If a system can help a quant desk, pharmaceutical researcher or strategic planning team make better decisions worth millions, a few hundred dollars of tokens per day is a reasonable expense.
When $150 per million tokens can still make sense
Huang goes further and notes that for some researcher-grade workflows, even prices around $150 per million tokens might be acceptable. Consider a scenario where a researcher uses 50 million tokens per day on a system that explores ideas, reads large corpora and runs experiments autonomously. At $150 per million tokens, that is $7,500 per day in raw token costs.
For many organisations, that would be unjustifiable. But for certain hedge funds, biotech labs, or national-security programmes, it might be a rounding error if the system consistently surfaces insights that humans would miss. In that sense, the upper end of the pricing spectrum is less about compute costs and more about the perceived return on intelligence.
How context and reasoning shape value
Two technical factors play outsized roles in this emerging market: context length and reasoning depth. Longer context windows let models ingest more documents, codebases or interaction history at once. Deeper reasoning — where the model takes many internal steps before answering — often requires generating more intermediate tokens, which further increases compute per query.
Faster token speeds make both of these features more usable. If a model can stream tokens quickly, it can afford to “think” longer without making the user wait unreasonably. That makes agentic behaviours — drafting code, running tools, revising plans — feel fluid rather than sluggish. The combination of long context, deep reasoning and high token speed is what justifies the highest pricing tiers: it turns raw tokens into compound, high-value work.
What this means for AI providers and customers
For AI providers, the implication is that there is no single “right” price for tokens. Instead, there will be a portfolio of services mapped to different parts of the throughput–speed plane and priced accordingly. Some offerings will compete on being the cheapest commodity tokens; others will compete on being the smartest or fastest tokens money can buy.
For customers, the challenge will be to match workloads to the right tier. Routine tasks — log analysis, simple summarisation, lightweight chatbots — will usually live in the lower-cost bands. High-stakes or high-margin workflows, especially those that benefit from deep reasoning and huge context windows, may justify moving up the curve. Over time, procurement teams are likely to think in terms of token budgets and token ROI rather than just API line items.
AI factories as the underlying commodity producers
Underpinning all of this is the physical reality of AI factories. Whether a provider offers a $3 tier or a $150 tier, the tokens all come from racks of accelerators, CPUs, storage and networking gear constrained by power and cooling. Architectures like Blackwell NVLink 72 and Rubin Ultra’s Kyber racks exist to push more tokens through those factories at higher speeds and lower marginal costs.
If Nvidia and its partners can keep improving tokens per megawatt and tokens per dollar of infrastructure, they can widen the gap between cost and price across the entire tier structure. That is why Huang spends so much time on both the hardware and the pricing model in the same keynote: they are two sides of the same commodity market. Better factories make more competitive token products; richer token products justify building more advanced factories.
Sources
- Nvidia GTC keynote remarks on throughput, token speed and the emerging tiers of token pricing
- Industry reporting on AI token economics, pricing per million tokens and the shift from flat per-call billing to tiered commodity markets
- Nvidia materials on AI factories and token factories as power-constrained producers of intelligence