Skip to content

From Retrieval to Generation: How ChatGPT Marked the Start of Nvidia’s Generative AI Era

Read Editorial Disclaimer
Disclaimer: Perspectives here reflect AI-POV and AI-assisted analysis, not any specific human author. Read full disclaimer — issues: report@theaipov.news

The release of ChatGPT marked what Nvidia CEO Jensen Huang calls the beginning of the generative AI era. In his GTC keynote, Huang described how earlier AI systems were focused mainly on perception and understanding tasks such as classification, search and pattern detection. With generative AI, software systems no longer just recognise or retrieve information — they produce new content, including text, images, code and video.

This shift, he argued, has fundamentally changed how computing works. Traditional systems were largely retrieval-based: they stored data and returned it when requested. Generative AI systems instead generate new outputs based on patterns learned from massive training datasets. That distinction has profound implications for how data centres are built, how hardware is designed, and how software platforms and cloud infrastructure are architected.

How generative AI differs from earlier AI waves

In the earlier perception-and-understanding era, AI was typically used to label images, rank search results, detect anomalies or recognise speech. These systems still relied heavily on retrieving and ranking pre-existing information or applying fixed decision rules derived from data.

Generative AI systems using large language models and related architectures take a different approach. By learning statistical patterns across enormous corpora, they can synthesise new sentences, images or code sequences that were never explicitly stored in any database. ChatGPT’s mainstream debut illustrated this shift: users could interact with a system that produced coherent, context-aware responses and content on demand, rather than just surfacing existing documents.

Why this changes computing architectures

Huang emphasised that moving from retrieval to generation is not just a user-interface change; it requires a rethinking of computing infrastructure. Training and serving generative models demand sustained, large-scale compute, high-bandwidth memory and fast interconnects. Data centres are being redesigned to prioritise these workloads, and cloud platforms are adapting their services and pricing around them.

Hardware design is also evolving. Accelerators such as GPUs sit at the heart of generative AI workloads, and the surrounding systems — from networking to storage — are being tuned to keep those accelerators fed with data. Software platforms, including Nvidia’s own stacks, are being updated to manage long-running, token-heavy inference and fine-tuning jobs rather than just short classification calls.

Tokens and the rise of reasoning workloads

In this generative context, tokens have become a practical unit of work and cost. Huang highlighted that AI companies and AI-native startups consume enormous numbers of tokens when they train models or serve user requests. Some generate tokens on their own infrastructure; others build services on top of tokens produced by providers like OpenAI or Anthropic.

The computational load has grown further as models have moved beyond simple next-token prediction into explicit reasoning. Huang pointed to systems that break complex problems into smaller steps they can understand and then ground those steps in available research and evidence. Models such as OpenAI’s o1 series attempt to reason through problems step by step, which has improved the reliability of generative AI and increased confidence in tools like ChatGPT.

That style of reasoning, however, consumes far more compute: more input tokens are needed to provide context, and more output tokens are generated as the model “thinks” through intermediate steps. Even when model size increases only modestly, the shift from short answers to multi-step reasoning traces multiplies the amount of compute required per request.

From reasoning to agentic systems

Huang’s broader framing places reasoning models alongside a newer class of agentic AI systems. Agent-based coding tools such as Claude Code can read files, analyse source code, compile programs, run tests, evaluate results and iterate on solutions by calling external tools. In many organisations, software engineers now work with assistants that help design and execute multi-step workflows rather than just answer questions.

This progression — from perception to generation, reasoning and now agents that act — has turned inference into one of the largest drivers of computing demand. Running these systems at scale requires sustained access to accelerated hardware and robust software stacks to orchestrate complex sequences of calls.

A platform shift on par with PCs, the internet and mobile

Huang placed the generative AI era within a broader history of computing shifts. The personal computer era produced companies like Microsoft. The internet era produced firms such as Google and Amazon. The mobile and cloud era brought platforms like Meta and other social and app ecosystems.

The current AI platform shift, catalysed by systems such as ChatGPT and reinforced by reasoning and agentic workloads, is expected to produce another generation of highly influential companies. Many of them will be AI-native, but incumbents that successfully adapt their infrastructure and products to generative AI may also emerge stronger. From Huang’s perspective, the scale of change is on par with previous revolutions — only this time, the centre of gravity is a computing model that learns, reasons and generates.

What this means for Nvidia’s platform

For Nvidia, the move from retrieval-based to generative and agentic computing reinforces the importance of its accelerated computing stack. GPUs and specialised systems are needed to train and run large models; domain-specific libraries and frameworks help developers build on top of those models; and cloud and on-premises infrastructure are being shaped around these workloads.

The company’s strategy is to position its hardware and software as foundational for organisations that want to participate in this new era, whether they are building models themselves or integrating generative and agentic capabilities into products. As data centres and platforms are redesigned around generation, reasoning and action rather than retrieval alone, Nvidia’s role as a provider of the underlying compute and software layers becomes more central.

Why inference is now a top driver of demand

AI inference — the process of running trained models to produce results — is becoming one of the largest drivers of computing demand globally. Training still requires huge clusters, but inference scales with every user request, every agentic run and every reasoning trace. Huang has noted that even as companies such as Nvidia ship large volumes of hardware, demand continues to rise because AI systems are now performing real productive work rather than only generating experimental outputs. That shift marks an important turning point for the industry and for how enterprises budget and plan their infrastructure.

Sources

  • Keynote remarks by Nvidia CEO Jensen Huang on the generative AI era, ChatGPT, reasoning workloads and the shift from retrieval to generation
  • Nvidia GTC materials and public documentation on large language models, generative workloads and accelerated computing architectures
  • Industry analysis of how generative and agentic AI are changing data centre design, cloud services and startup business models

Related Video

Related video — Watch on YouTube
Read More News
Apr 24

How To Build A Legal RAG App In Weaviate

Apr 16

AI YouTube Clones Are Turning Professor Jiang’s Viral Rise Into A Conspiracy Machine

Apr 16

The Iran Ceasefire Is Turning Into A Maritime Pressure Campaign

Apr 16

China’s Taiwan Carrot Still Depends On Military Pressure

Apr 16

Putin’s Easter Ceasefire Shows Why Russia Still Controls The Timing

Apr 16

OpenAI’s Cyber Defense Push Shows GPT-5.4 Is Arriving With Guardrails

Apr 16

Meta’s Muse Spark Makes Subagents The New Face Of Meta AI

Apr 12

Your Fingerprints Are Now Europe’s First Gatekeeper: How a Digital Border Quietly Seized Unprecedented Control

Apr 12

Meloni’s Crime Wave Panic: A January Stabbing Becomes April’s Political Opportunity

Apr 12

Germany’s Noon Price Cap Is Economic Surrender Dressed as Policy Innovation

Apr 12

Germany’s Quiet Healthcare Revolution: How Free Lung Cancer Screening Reveals What’s Really Broken

Apr 12

France’s Buried Confession: Why Naming America as an Election Threat Really Means

Apr 12

The State as Digital Parent: Why the UK’s Teen Social Media Ban Is Actually Totalitarian

Apr 12

Starmer’s Crypto Ban Is Political Theater Hiding a Completely Different Story

Apr 12

Spain’s €5 Billion Emergency Response Will Delay Economic Pain, Not Prevent It

Apr 12

The Spanish Soldier Detention Reveals the EU’s Fractured Israel Strategy

Apr 12

Anthropic’s Mythos Reveals the Truth: AI Labs Now Possess Models That Exceed Human Capability

Apr 12

Polymarket’s Pattern of Suspiciously Timed Bets Reveals Systemic Information Asymmetry

Apr 12

Beyond Nostalgia: How Japan’s Article 9 Debate Reveals a Civilization Under Existential Pressure

Apr 12

Japan’s Oil Panic Exposes the Myth of Wealthy Nation Invulnerability

Apr 12

Brazil’s 2026 Rematch: The Election That Will Determine If Latin America Surrenders to the Left

Apr 12

Brazil’s Lithium Trap: How the Energy Transition Boom Could Destroy the Region’s Future

Apr 12

Australia’s Iran Refusal: A Sovereign Challenge to American Hegemony That Will Cost It Dearly

Apr 12

Artemis II’s Historic Return: The Moon Mission That Should Be Celebrated but Reveals Space’s True Purpose

Apr 12

Why the Netherlands’ Tesla FSD Approval Is a Regulatory Trap for Europe

Apr 12

The Dutch Government’s Shareholder Revolt Could Reshape Executive Compensation Across Europe

Apr 12

Poland’s Economic Success Cannot Prevent the Rise of Polexit and European Fragmentation

Apr 12

The Poland-South Korea Defense Partnership Is Quietly Reshaping European Security Architecture

Apr 12

North Korea’s Missile Tests Are Reactive—The Real Escalation Is Seoul’s Preemption Strategy

Apr 12

Samsung’s Record Earnings Are Real, But the Profits Vanish When You Understand the Costs

Apr 12

Turkey’s Radical Tobacco Ban Could Kill an Industry—But First It Will Consolidate Power

Apr 12

Turkey’s Balancing Act Is Breaking: Fitch Downgrade Reveals Currency Collapse Risk

Apr 12

Milei’s Libertarian Experiment Is Unraveling: Approval Hits Historic Low

Apr 12

Mexico’s Last Fossil Fuel Bet: Saguaro LNG Would Transform Mexico’s Energy Future—If It Survives Politics

Apr 12

Mexico’s World Cup Dream Meets Security Nightmare: 100,000 Troops Cannot Prevent Cartel War Bloodshed