When an AI system breaks down a complex problem into smaller steps it can understand, it can then ground those steps in available research and evidence. Models such as OpenAI o1 introduced this reasoning capability. Instead of simply generating text, the model attempts to reason through problems step by step. This made generative AI more reliable because the system tries to base its answers on structured reasoning rather than only pattern prediction. That moment significantly increased the credibility of generative AI systems and accelerated the adoption of ChatGPT.
However, this reasoning process also requires much more computation. The number of input tokens used for context increases, and the number of output tokens generated during the reasoning process also increases. Even if the model size is only slightly larger, the reasoning process itself dramatically increases the computational workload.
What are agent-based coding systems?
Another major step came with the introduction of agent-based coding systems such as Claude Code. Unlike traditional chat-based models, agentic systems can interact with real tools. They can read files, analyse source code, compile programs, run tests, evaluate results, and iterate on the solution. This capability has started to change how software development is performed.
Many engineering teams now use a combination of AI coding tools such as Claude Code, OpenAI Codex, and Cursor IDE. In many organisations, almost every software engineer now works with one or more AI assistants during development.
How the way we use AI has changed
This shift also changes how people interact with AI systems. Earlier systems were mostly used for information queries — questions such as what, where, or when. Agent-based AI systems are instead given instructions such as create, build, or execute. They can access context, read project files, use external tools, and break down problems into steps. The system can reason through a task, reflect on intermediate results, and continue iterating until the task is completed.
The four stages of AI: perception, generation, reasoning, agentic
Because of this evolution, AI has moved through several stages. Initially, AI systems mainly focused on perception, meaning they could recognise patterns or understand data. Then they developed generative capabilities, producing new text, images, or code. The next stage introduced reasoning, allowing models to think through problems. The current stage is agentic AI, where systems can perform real tasks and produce productive output.
Each stage has built on the last. Perception gave machines the ability to classify and search. Generation let them create new content. Reasoning made that content more reliable by grounding it in step-by-step logic. Agentic AI turns that capability into action — not just answering questions but reading files, running tools and finishing multi-step jobs.
Why computing demand has exploded
This progression has caused a large increase in computing demand. The amount of computation required for training and especially for AI inference has grown rapidly. Demand for GPUs used in AI workloads has increased significantly, and in many markets GPU capacity has been scarce. Even though companies such as Nvidia are shipping large volumes of hardware, demand continues to rise because AI systems are now performing real productive work rather than only generating experimental outputs.
This shift marks an important turning point. AI inference — the process of running trained models to produce results — is becoming one of the largest drivers of computing demand. Every time a user asks a reasoning model to think through a problem, or an agentic system runs a test suite or edits a file, the system consumes more tokens and more compute than a simple one-shot answer would have required. Multiply that by millions of users and thousands of applications, and the scale of the infrastructure build-out becomes clear.
What this means for developers and enterprises
For developers, the move from chat to agents means that AI is no longer a tool you query occasionally but a partner that can own entire workflows. Prompts shift from “what is X?” to “build Y” or “fix Z.” The system has access to the same context a human would — files, logs, tests — and can iterate until the task is done. For enterprises, that same shift means that AI spend is increasingly tied to real production workloads: code generation, document processing, customer support, and internal tools that run around the clock.
The transcript from Nvidia’s GTC keynote captures this precisely: the industry has moved from systems that retrieved or classified to systems that generate, then reason, then act. Each step has made AI more useful — and each step has required more compute. The inference inflection point is the result.
In practice, that means engineering and product teams are no longer asking whether to use AI but how to secure enough capacity to run it at scale. Tools like Cursor and Claude Code are already part of the daily workflow for many developers; the constraint is no longer adoption but the availability of the underlying inference infrastructure to support it.
Sources
- Nvidia GTC keynote transcript on reasoning models (OpenAI o1), agent-based coding (Claude Code, Codex, Cursor), and the four stages of AI evolution
- OpenAI and Anthropic product documentation on reasoning and agentic capabilities
- Industry reporting on GPU demand and AI inference workloads