The release of ChatGPT marked what Nvidia CEO Jensen Huang calls the beginning of the generative AI era. In his GTC keynote, Huang described how earlier AI systems were focused mainly on perception and understanding tasks such as classification, search and pattern detection. With generative AI, software systems no longer just recognise or retrieve information — they produce new content, including text, images, code and video.
This shift, he argued, has fundamentally changed how computing works. Traditional systems were largely retrieval-based: they stored data and returned it when requested. Generative AI systems instead generate new outputs based on patterns learned from massive training datasets. That distinction has profound implications for how data centres are built, how hardware is designed, and how software platforms and cloud infrastructure are architected.
How generative AI differs from earlier AI waves
In the earlier perception-and-understanding era, AI was typically used to label images, rank search results, detect anomalies or recognise speech. These systems still relied heavily on retrieving and ranking pre-existing information or applying fixed decision rules derived from data.
Generative AI systems using large language models and related architectures take a different approach. By learning statistical patterns across enormous corpora, they can synthesise new sentences, images or code sequences that were never explicitly stored in any database. ChatGPT’s mainstream debut illustrated this shift: users could interact with a system that produced coherent, context-aware responses and content on demand, rather than just surfacing existing documents.
Why this changes computing architectures
Huang emphasised that moving from retrieval to generation is not just a user-interface change; it requires a rethinking of computing infrastructure. Training and serving generative models demand sustained, large-scale compute, high-bandwidth memory and fast interconnects. Data centres are being redesigned to prioritise these workloads, and cloud platforms are adapting their services and pricing around them.
Hardware design is also evolving. Accelerators such as GPUs sit at the heart of generative AI workloads, and the surrounding systems — from networking to storage — are being tuned to keep those accelerators fed with data. Software platforms, including Nvidia’s own stacks, are being updated to manage long-running, token-heavy inference and fine-tuning jobs rather than just short classification calls.
Tokens and the rise of reasoning workloads
In this generative context, tokens have become a practical unit of work and cost. Huang highlighted that AI companies and AI-native startups consume enormous numbers of tokens when they train models or serve user requests. Some generate tokens on their own infrastructure; others build services on top of tokens produced by providers like OpenAI or Anthropic.
The computational load has grown further as models have moved beyond simple next-token prediction into explicit reasoning. Huang pointed to systems that break complex problems into smaller steps they can understand and then ground those steps in available research and evidence. Models such as OpenAI’s o1 series attempt to reason through problems step by step, which has improved the reliability of generative AI and increased confidence in tools like ChatGPT.
That style of reasoning, however, consumes far more compute: more input tokens are needed to provide context, and more output tokens are generated as the model “thinks” through intermediate steps. Even when model size increases only modestly, the shift from short answers to multi-step reasoning traces multiplies the amount of compute required per request.
From reasoning to agentic systems
Huang’s broader framing places reasoning models alongside a newer class of agentic AI systems. Agent-based coding tools such as Claude Code can read files, analyse source code, compile programs, run tests, evaluate results and iterate on solutions by calling external tools. In many organisations, software engineers now work with assistants that help design and execute multi-step workflows rather than just answer questions.
This progression — from perception to generation, reasoning and now agents that act — has turned inference into one of the largest drivers of computing demand. Running these systems at scale requires sustained access to accelerated hardware and robust software stacks to orchestrate complex sequences of calls.
A platform shift on par with PCs, the internet and mobile
Huang placed the generative AI era within a broader history of computing shifts. The personal computer era produced companies like Microsoft. The internet era produced firms such as Google and Amazon. The mobile and cloud era brought platforms like Meta and other social and app ecosystems.
The current AI platform shift, catalysed by systems such as ChatGPT and reinforced by reasoning and agentic workloads, is expected to produce another generation of highly influential companies. Many of them will be AI-native, but incumbents that successfully adapt their infrastructure and products to generative AI may also emerge stronger. From Huang’s perspective, the scale of change is on par with previous revolutions — only this time, the centre of gravity is a computing model that learns, reasons and generates.
What this means for Nvidia’s platform
For Nvidia, the move from retrieval-based to generative and agentic computing reinforces the importance of its accelerated computing stack. GPUs and specialised systems are needed to train and run large models; domain-specific libraries and frameworks help developers build on top of those models; and cloud and on-premises infrastructure are being shaped around these workloads.
The company’s strategy is to position its hardware and software as foundational for organisations that want to participate in this new era, whether they are building models themselves or integrating generative and agentic capabilities into products. As data centres and platforms are redesigned around generation, reasoning and action rather than retrieval alone, Nvidia’s role as a provider of the underlying compute and software layers becomes more central.
Why inference is now a top driver of demand
AI inference — the process of running trained models to produce results — is becoming one of the largest drivers of computing demand globally. Training still requires huge clusters, but inference scales with every user request, every agentic run and every reasoning trace. Huang has noted that even as companies such as Nvidia ship large volumes of hardware, demand continues to rise because AI systems are now performing real productive work rather than only generating experimental outputs. That shift marks an important turning point for the industry and for how enterprises budget and plan their infrastructure.
Sources
- Keynote remarks by Nvidia CEO Jensen Huang on the generative AI era, ChatGPT, reasoning workloads and the shift from retrieval to generation
- Nvidia GTC materials and public documentation on large language models, generative workloads and accelerated computing architectures
- Industry analysis of how generative and agentic AI are changing data centre design, cloud services and startup business models