In Nvidia’s latest GTC keynote, the Vera Rubin platform moved from slideware to hardware that you can roll onto a stage. The system is almost unrecognisable compared with the cabled racks of earlier GPU clusters. It is 100% liquid cooled, the cable jungle is gone, and what once took two days to install can now be done in roughly two hours. That change is not cosmetic. Shorter manufacturing and deployment cycle times translate directly into faster AI factory build-outs and quicker access to revenue-generating compute.
The Vera Rubin system is also designed to be cooled by hot water at around 45 degrees. Instead of data centres burning energy to chill air and then blowing it past racks, much of the cooling burden is pushed into the rack itself. Hot-water cooling reduces the load on facility HVAC systems, cuts operating costs and frees more of the site’s power budget for the AI factory rather than the building infrastructure around it.
NVLink as a sixth-generation scale-up fabric
At the heart of Rubin Ultra is something Jensen Huang calls the “secret sauce”: the sixth-generation NVLink scale-up switching system. This is neither Ethernet nor InfiniBand; it is Nvidia’s own high-bandwidth, low-latency interconnect designed specifically for GPU-to-GPU communication. Huang is blunt about its difficulty: building such a fabric at this scale is “insanely hard to do well” and “insanely hard to do at all”.
The latest NVLink generation is itself fully liquid cooled. Switching elements are integrated into the rack, with cooling loops designed alongside the compute nodes. The goal is to turn an entire rack — and, at Rubin Ultra scale, an entire row of racks — into a single coherent performance domain. For long-context, agentic AI workloads that need thousands of GPUs to act like one machine, that coherence is the difference between theoretical FLOPS and real token throughput.
Groq LP300, Spectrum-X and co-packaged optics
Alongside Rubin, Nvidia is also showing off new building blocks that sit around the GPU core. One is a Groq system based on the brand-new LP300, an accelerator described as something “the world has never seen before” and already in volume production. Another is the world’s first CPO Spectrum-X switch, which uses co-packaged optics (CPO) so that optics sit directly on the chip and interface with silicon without long electrical runs.
In a CPO design, electrons are converted to photons at the package, and fibre connects directly into the switch silicon. Nvidia co-developed this process technology with TSMC and, as Huang emphasises, is currently the only company in production with it. The idea is to push bandwidth and efficiency higher than is possible when optical modules live on separate pluggable transceivers. For AI factories, that means more traffic per rack, less power lost in electrical links and simpler high-density cabling at the top of rack.
Vera CPUs and BlueField-4 STX storage
The Vera Rubin platform is not only about GPUs. It also introduces the Vera CPU, a processor Huang claims delivers twice the performance per watt of any other CPU on the market today. Nvidia initially expected to sell CPUs mainly as part of GPU systems, but demand has turned them into a standalone multi-billion-dollar business line. The message is clear: orchestration, preprocessing and control-plane work are now critical enough that CPU efficiency matters almost as much as GPU throughput.
Rubin Ultra racks also integrate BlueField-4 STX, Nvidia’s new storage platform. By putting DPUs directly in the storage path, BlueField can handle data movement, security and offload tasks without burdening GPUs or general-purpose CPUs. In AI factories where input and output tokens are constantly streaming, that kind of fast, programmable storage fabric is essential to keep accelerators fed.
Kyber racks and the Rubin Ultra domain
The most visually striking part of Rubin Ultra is the new rack design, code-named Kyber. Traditional racks are front-loaded with servers and backed by bundles of copper and fibre cables. Kyber is different. Compute nodes slide vertically into the front of the rack; at the centre is a midplane with four high-density NVLink connectors per node. When a node is inserted, those connectors mate with the midplane, creating a rigid, structured interconnect with no manual cabling between nodes.
On the back of the midplane sit the NVLink switches, mounted vertically. Compute nodes in the front, NVLink fabric in the back: together they connect 144 GPUs into one NVLink domain. Huang calls this configuration Rubin Ultra. From the software’s point of view, each Kyber rack becomes a single giant computer. Multiple racks then link together into larger compute clusters, but the basic abstraction is already an AI factory at the rack scale.
Because connections are made through the midplane rather than loose cables, Kyber also simplifies installation and service. The heaviest part of the rack is the NVLink section itself, which Huang jokes seems to get heavier every year as more capability is packed inside. But for operators, the trade-off is worthwhile: structured cabling and vertical insertion make it faster to deploy and replace nodes, and easier to reason about airflow and coolant paths.
Applying the same ideas to Ethernet racks
Nvidia is also taking the design lessons from Kyber and applying them to Ethernet-based systems. One demo rack in the keynote contains 256 liquid-cooled nodes in a single rack, connected with the same kind of high-density connectors and structured cabling used in NVLink systems. The idea is that whether a customer chooses NVLink-based scale-up or Ethernet-based scale-out, they get the same factory-friendly installation, serviceability and power-density story.
In practice, that means less time pulling cables, fewer opportunities for human error and a clearer path to scaling AI factories from a handful of racks to hundreds. It also aligns with Nvidia’s broader Spectrum-X strategy: specialised Ethernet fabrics tuned for AI traffic patterns, dropped into racks that have already been optimised for liquid cooling and high-density node layouts.
Why Vera Rubin Ultra matters for AI factories
Stepping back, Vera Rubin Ultra is Nvidia’s answer to a simple but brutal constraint: power. Every large AI data centre is power-limited. Within that fixed budget, the job of an AI factory is to maximise throughput (total tokens produced) and token speed (how fast those tokens can be generated) at a given power level. Liquid-cooled Kyber racks, NVLink 6, CPO Spectrum-X switches, Vera CPUs and BlueField-4 STX are all pieces of a single optimisation problem.
If Rubin Ultra can deliver more tokens per second per megawatt than previous architectures — while also being faster to manufacture, install and service — it gives operators a way to stretch scarce power budgets further. That, in turn, determines which companies can afford to offer faster models, longer context windows and richer agentic workflows. In Huang’s telling, every CEO running AI infrastructure will need to understand these racks, because they are the machines that turn data and electricity into tomorrow’s intelligence.
Sources
- Nvidia GTC keynote demonstrations of Vera Rubin, Rubin Ultra, Kyber racks and liquid-cooled NVLink domains
- Nvidia materials on CPO Spectrum-X switches, co-packaged optics and BlueField-4 STX storage platforms
- Industry analysis of hot-water cooling, rack-scale design and power-constrained AI factories