TL;DR

NVIDIA unveiled the Rubin platform, a rack-scale AI architecture built from six newly announced chips that the company says will cut inference token costs and training GPU counts versus its prior generation. The platform includes a new CPU, GPU, interconnect, SuperNIC, DPU and Ethernet switch and has early ecosystem backing from cloud and AI providers.

What happened

At CES, NVIDIA announced the Rubin platform, a codesigned hardware and software architecture composed of six new components: the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet switch. The company described two rack-scale systems, the Vera Rubin NVL72 and the HGX Rubin NVL8, and outlined five technology advances intended for large-scale reasoning, agentic AI and mixture-of-experts (MoE) models. NVIDIA claims Rubin can reduce inference token cost by up to 10x and train MoE models with four times fewer GPUs compared with its Blackwell generation, and it highlighted bandwidth and compute figures for NVLink and the Rubin GPU’s Transformer Engine. NVIDIA also announced new storage and security features — including a BlueField-4–based inference context memory storage platform and rack-scale confidential computing — and named a broad list of technology partners and early deployers that are expected to adopt the platform.

Why it matters

  • If NVIDIA’s performance and efficiency claims hold in practice, Rubin could change infrastructure economics for large-scale inference and reasoning workloads.
  • The platform’s codesigned stack — spanning CPU, GPU, interconnect, NIC, DPU and switch — aims to reduce bottlenecks that arise when scaling multistep or agentic AI models.
  • Built-in confidential computing and storage acceleration target enterprise requirements for secure, large-context AI deployments.
  • Broad industry support from cloud providers, AI labs and system makers could speed availability of Rubin-based capacity across public clouds and specialized providers.

Key facts

  • Rubin consists of six newly announced components: NVIDIA Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU and Spectrum-6 Ethernet Switch.
  • NVIDIA presented two rack-scale systems: the Vera Rubin NVL72 and the HGX Rubin NVL8.
  • NVIDIA claims up to a 10x reduction in inference token cost and a 4x reduction in GPUs needed to train MoE models versus the Blackwell platform.
  • NVLink 6 is described as providing 3.6 TB/s per GPU and 260 TB/s aggregate bandwidth in the NVL72 rack, with in-network compute for collective operations.
  • Rubin GPU features a third-generation Transformer Engine with hardware-accelerated adaptive compression and 50 petaflops of NVFP4 compute for inference (as stated by NVIDIA).
  • NVIDIA Vera CPU is reported to use 88 custom Olympus cores, support Armv9.2 compatibility and include NVLink-C2C connectivity.
  • A new inference context memory storage platform using the BlueField-4 storage processor is part of the Rubin announcements.
  • NVIDIA said Rubin will support rack-scale confidential computing spanning CPU, GPU and interconnect.
  • CoreWeave is named among the first providers to offer Rubin, operating it through CoreWeave Mission Control; Microsoft said its Fairwater AI superfactories will include Vera Rubin NVL72 systems at very large scale.
  • NVIDIA and Red Hat announced an expanded collaboration to optimize Red Hat Enterprise Linux, OpenShift and Red Hat AI for the Rubin platform.

What to watch next

  • Independent benchmarks and third-party performance comparisons of Rubin against Blackwell and other platforms — not confirmed in the source.
  • Public cloud availability timelines and pricing for Rubin-based instances from major providers — not confirmed in the source.
  • Real-world adoption and deployment schedules from named partners such as AWS, Google Cloud, Microsoft, OpenAI and Anthropic; NVIDIA listed these organizations among expected adopters.
  • How the promised confidential computing and BlueField-4 storage capabilities perform in production AI pipelines — not confirmed in the source.

Quick glossary

  • GPU: Graphics processing unit; a parallel processor commonly used to accelerate machine learning training and inference.
  • NVLink: A high-speed interconnect technology for linking multiple GPUs and other components to enable fast data transfer and collective operations.
  • DPU: Data processing unit; a programmable processor that offloads networking, storage and security tasks from the CPU.
  • SuperNIC: An enhanced network interface card that integrates advanced networking, security and in-network compute functions for data center workloads.
  • Mixture-of-Experts (MoE): A neural network architecture that routes inputs to specialized sub-models (experts) to scale capacity while reducing compute cost for certain tasks.
  • Transformer Engine: Hardware and software optimizations tailored for transformer-based models, intended to accelerate training and inference.

Reader FAQ

What is the Rubin platform?
Rubin is NVIDIA’s codesigned AI platform announced at CES that combines six new chips and rack-scale systems intended to accelerate training and inference for large AI models.

How much faster or cheaper is Rubin compared with NVIDIA’s prior generation?
NVIDIA states Rubin can lower inference token cost up to 10x and require 4x fewer GPUs to train MoE models versus the Blackwell platform.

Which companies will offer or use Rubin?
NVIDIA named a broad set of expected adopters and partners, including major cloud providers, AI labs and system makers; CoreWeave is cited as among the first operators and Microsoft said its Fairwater superfactories will include Vera Rubin NVL72 systems.

When will Rubin systems be generally available and what will they cost?
not confirmed in the source

Does Rubin include security features?
Yes; NVIDIA described third-generation confidential computing implemented at rack scale across CPU, GPU and interconnect components.

Newsroom Press Release SHARE NVIDIA Kicks Off the Next Generation of AI With Rubin — Six New Chips, One Incredible AI Supercomputer Extreme Codesign Across NVIDIA Vera CPU, Rubin GPU,…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *