From Retrieval to Generation: How ChatGPT Marked the Start of Nvidia’s Generative AI Era

Read Editorial Disclaimer

Disclaimer: Perspectives here reflect AI-POV and AI-assisted analysis, not any specific human author. Read full disclaimer — issues: report@theaipov.news

By Tech Desk | March 16, 2026 | 5 min read AI-Assisted | Source: YouTube / Nvidia

The release of ChatGPT marked what Nvidia CEO Jensen Huang calls the beginning of the generative AI era. In his GTC keynote, Huang described how earlier AI systems were focused mainly on perception and understanding tasks such as classification, search and pattern detection. With generative AI, software systems no longer just recognise or retrieve information — they produce new content, including text, images, code and video.

This shift, he argued, has fundamentally changed how computing works. Traditional systems were largely retrieval-based: they stored data and returned it when requested. Generative AI systems instead generate new outputs based on patterns learned from massive training datasets. That distinction has profound implications for how data centres are built, how hardware is designed, and how software platforms and cloud infrastructure are architected.

How generative AI differs from earlier AI waves

In the earlier perception-and-understanding era, AI was typically used to label images, rank search results, detect anomalies or recognise speech. These systems still relied heavily on retrieving and ranking pre-existing information or applying fixed decision rules derived from data.

Generative AI systems using large language models and related architectures take a different approach. By learning statistical patterns across enormous corpora, they can synthesise new sentences, images or code sequences that were never explicitly stored in any database. ChatGPT’s mainstream debut illustrated this shift: users could interact with a system that produced coherent, context-aware responses and content on demand, rather than just surfacing existing documents.

Why this changes computing architectures

Huang emphasised that moving from retrieval to generation is not just a user-interface change; it requires a rethinking of computing infrastructure. Training and serving generative models demand sustained, large-scale compute, high-bandwidth memory and fast interconnects. Data centres are being redesigned to prioritise these workloads, and cloud platforms are adapting their services and pricing around them.

Hardware design is also evolving. Accelerators such as GPUs sit at the heart of generative AI workloads, and the surrounding systems — from networking to storage — are being tuned to keep those accelerators fed with data. Software platforms, including Nvidia’s own stacks, are being updated to manage long-running, token-heavy inference and fine-tuning jobs rather than just short classification calls.

Tokens and the rise of reasoning workloads

In this generative context, tokens have become a practical unit of work and cost. Huang highlighted that AI companies and AI-native startups consume enormous numbers of tokens when they train models or serve user requests. Some generate tokens on their own infrastructure; others build services on top of tokens produced by providers like OpenAI or Anthropic.

The computational load has grown further as models have moved beyond simple next-token prediction into explicit reasoning. Huang pointed to systems that break complex problems into smaller steps they can understand and then ground those steps in available research and evidence. Models such as OpenAI’s o1 series attempt to reason through problems step by step, which has improved the reliability of generative AI and increased confidence in tools like ChatGPT.

That style of reasoning, however, consumes far more compute: more input tokens are needed to provide context, and more output tokens are generated as the model “thinks” through intermediate steps. Even when model size increases only modestly, the shift from short answers to multi-step reasoning traces multiplies the amount of compute required per request.

From reasoning to agentic systems

Huang’s broader framing places reasoning models alongside a newer class of agentic AI systems. Agent-based coding tools such as Claude Code can read files, analyse source code, compile programs, run tests, evaluate results and iterate on solutions by calling external tools. In many organisations, software engineers now work with assistants that help design and execute multi-step workflows rather than just answer questions.

This progression — from perception to generation, reasoning and now agents that act — has turned inference into one of the largest drivers of computing demand. Running these systems at scale requires sustained access to accelerated hardware and robust software stacks to orchestrate complex sequences of calls.

A platform shift on par with PCs, the internet and mobile

Huang placed the generative AI era within a broader history of computing shifts. The personal computer era produced companies like Microsoft. The internet era produced firms such as Google and Amazon. The mobile and cloud era brought platforms like Meta and other social and app ecosystems.

The current AI platform shift, catalysed by systems such as ChatGPT and reinforced by reasoning and agentic workloads, is expected to produce another generation of highly influential companies. Many of them will be AI-native, but incumbents that successfully adapt their infrastructure and products to generative AI may also emerge stronger. From Huang’s perspective, the scale of change is on par with previous revolutions — only this time, the centre of gravity is a computing model that learns, reasons and generates.

What this means for Nvidia’s platform

For Nvidia, the move from retrieval-based to generative and agentic computing reinforces the importance of its accelerated computing stack. GPUs and specialised systems are needed to train and run large models; domain-specific libraries and frameworks help developers build on top of those models; and cloud and on-premises infrastructure are being shaped around these workloads.

The company’s strategy is to position its hardware and software as foundational for organisations that want to participate in this new era, whether they are building models themselves or integrating generative and agentic capabilities into products. As data centres and platforms are redesigned around generation, reasoning and action rather than retrieval alone, Nvidia’s role as a provider of the underlying compute and software layers becomes more central.

Why inference is now a top driver of demand

AI inference — the process of running trained models to produce results — is becoming one of the largest drivers of computing demand globally. Training still requires huge clusters, but inference scales with every user request, every agentic run and every reasoning trace. Huang has noted that even as companies such as Nvidia ship large volumes of hardware, demand continues to rise because AI systems are now performing real productive work rather than only generating experimental outputs. That shift marks an important turning point for the industry and for how enterprises budget and plan their infrastructure.

Sources

Keynote remarks by Nvidia CEO Jensen Huang on the generative AI era, ChatGPT, reasoning workloads and the shift from retrieval to generation
Nvidia GTC materials and public documentation on large language models, generative workloads and accelerated computing architectures
Industry analysis of how generative and agentic AI are changing data centre design, cloud services and startup business models

Related Video

Related video — Watch on YouTube

Read More News

New Zealand’s petrol pain is really a subsidy war between drivers and EV buyers

Closing the Kennedy Center is really a warning shot at Washington’s arts class

What the Kennedy Center fight reveals about who really controls U.S. culture funding

Vanity Fair’s Oscar party turns awards night into a celebrity brand marketplace

Copyright lawsuits against OpenAI are really about who owns the language we use

GTC 2026 will reveal how far behind the rest of Big Tech is on AI infrastructure

Nvidia is using GTC 2026 to lock AI developers into its ecosystem for a decade

Trump’s threats over Iranian oil routes signal a larger election-year energy gamble

U.S. voters will feel the Hormuz crisis at the pump long before the battlefield

Why Grace Blackwell and Rubin Multiply Revenue Capacity Across Every Token Tier

How Nvidia and Groq LP300 Plus Dynamo Unlock 35× on the Highest-Value Inference Tier

Inside Vera Rubin Ultra: Liquid-Cooled Racks for the Next Generation of AI Factories

How Token Pricing Tiers Will Reshape the AI Economy

Inside the AI Token Factory: Why Tokens Became the New Commodity of Computing

From DGX-1 to Rubin: How Nvidia Turned Data Centres into AI Factories

“This Is the Beginning of Something Very, Very Big”: Nvidia’s Jensen Huang on AI-Native Companies

From Perception to Agentic AI: How Reasoning and Coding Agents Changed the Game

The Inference Inflection Point: Why AI Computing Demand Grew a Million Times in Two Years

Healthcare Enters Its ‘ChatGPT Moment’ on Nvidia’s Accelerated Platform

Inside the Trillion-Dollar Industries Powering Nvidia’s AI Infrastructure Boom

Jensen Huang Explains Why Nvidia Is ‘Vertically Integrated but Horizontally Open’

Nvidia, Palantir and Dell Team Up on Air-Gapped AI Platforms

Nvidia CEO Jensen Huang Maps Out the AI Cloud Future in Live Keynote

Team USA’s Route to the Gold Medal Game Says More About the Field Than the Score

Jessie Buckley and the Oscars Narrative Ireland Wants to Tell

Winter Storm Wisconsin Updates: What We Know So Far

Why Iran Chose This Moment to Escalate the Strait of Hormuz Crisis

What the Oscars 2026 Winners Mean for Streaming Services and Theater Chains

The Last Time Oil Hit $100 During a Middle East Crisis, Recession Followed Within Months

Why Matchday Prep Stories Like Real Sociedad’s Rain Session Get Pushed as News

Trump’s Oil Infrastructure Threat Signals a Shift Away From Diplomatic Containment

Intuit’s Buyback Gambit Shows How AI Panic Is Warping Wall Street

Gas Prices Over $100 Per Barrel Will Force Fed to Choose Between Inflation Control and Economic Growth

Severe Weather Sunday and Monday: What We Know So Far

Why Meteorologists Keep Calling It the ‘Last’ Cold Front