TL;DR

At CES Nvidia provided an early look at its Vera Rubin CPU and GPU platform, including NVL72 rack refinements, performance claims versus Blackwell, and new system features for serviceability and confidential computing. Shipments are still expected in the second half of the year; many implementation and pricing details remain unconfirmed.

What happened

Nvidia used CES to further unpack its next-generation Vera Rubin platform, outlining the NVL72 rack, Rubin superchips, and complementary accelerators. The company says Rubin will deliver up to 5x higher floating-point inference performance and 3.5x higher training performance than Blackwell, along with 2.8x more HBM4 memory bandwidth and a doubled NvLink interconnect. The Rubin superchip (likely code-named VR200) pairs two dual-die Rubin GPUs — each offering peak claims of 50 petaFLOPS for inference and 35 petaFLOPS for training using NVLFP4 — with a Vera Arm CPU that has 88 Olympus cores and 1.5 TB of LPDDR5x. Nvidia also highlighted serviceability improvements for NVL72 racks, system-level confidential computing across NvLink, a CPX prefill accelerator optimized for LLM inference, and infrastructure pieces including ConnectX-9 NICs and BlueField-4 DPUs. Despite the CES revelations, Rubin hardware is still slated to arrive in the second half of the year.

Why it matters

  • If realized, Rubin’s performance and bandwidth gains could accelerate large-model training and high-throughput inference workloads.
  • System-level confidential computing across NvLink could change deployment options for sensitive models and datasets.
  • New serviceability and telemetry features may reduce downtime and operational cost for large GPU clusters.
  • KV cache offloading and BlueField-4 DPUs aim to address inference bottlenecks, potentially improving real-world latency and efficiency.
  • AMD’s competing rack roadmap and Helios claims frame Rubin’s release as part of intensified datacenter GPU competition.

Key facts

  • Nvidia claims up to 5x floating-point inference performance and 3.5x training performance versus Blackwell.
  • Rubin GPUs use a new adaptive compression technique for generative AI and MoE inference rather than structured sparsity.
  • Each Rubin superchip pairs two dual-die GPUs delivering peak NVFP4 figures: 50 petaFLOPS inference and 35 petaFLOPS training.
  • HBM4 capacity per Rubin GPU is 288 GB (576 GB per superchip) with 22 TB/s bandwidth per socket (44 TB/s per superchip).
  • The Vera CPU contains 88 custom Olympus Arm cores with 1.5 TB of LPDDR5x memory and connects to GPUs via a 1.8 TB/s NvLink-C2C link.
  • NVL72 racks include 72 Rubin GPUs, 36 Vera CPUs, 20.7 TB of HBM4, 54 TB of LPDDR5x across 18 compute blades, and nine NvSwitch 6 blades.
  • Nvidia has abandoned the NVL144 naming change and will continue counting SXM modules as GPUs for NVL72.
  • Rubin CPX accelerators are intended for LLM prefill, offering 30 petaFLOPS of NVFP4 compute and 128 GB of GDDR7 memory.
  • Eight NVL72 racks form a SuperPOD; eight-way HGX Rubin systems (NVL8) remain available but require liquid cooling.
  • Nvidia previewed ConnectX-9 (1.6 Tbps NIC) and BlueField-4 DPUs (800 Gbps ConnectX-9 plus a 64-core Grace CPU) for networking and offload.

What to watch next

  • Exact launch and shipment schedules beyond the broad 'second half of the year' timeline.
  • Actual system power consumption and efficiency figures as Nvidia finalizes Rubin hardware — not confirmed in the source.
  • Support and performance details for higher-precision data types such as FP8 and BF16 in Rubin workloads — not confirmed in the source.
  • Whether Nvidia or partners deliver rack configurations with 144 GPUs or other NVL144-class systems — not confirmed in the source.

Quick glossary

  • NVL72: Nvidia rack-scale system configuration that hosts 72 SXM GPU modules in a multi-blade chassis for hyperscale AI workloads.
  • HBM4: High Bandwidth Memory generation 4, a type of stacked memory used on GPUs to provide large capacity and very high bandwidth.
  • NvLink: Nvidia’s proprietary high-speed interconnect technology for linking GPUs, CPUs, and other accelerators to share memory and data.
  • DPU (Data Processing Unit): A programmable accelerator designed to offload networking, storage, and security tasks from the host CPU, often used for infrastructure functions.
  • KV cache: Key–value cache storing intermediate model vectors (keys and values) to avoid recomputing repeated components during inference.

Reader FAQ

When will Vera Rubin systems ship?
Nvidia expects Rubin hardware to arrive in the second half of the year.

How much faster is Rubin compared with Blackwell?
Nvidia claims up to 5x higher floating-point inference performance, 3.5x for training, 2.8x more memory bandwidth, and a twice-as-fast NvLink interconnect.

Will Rubin support FP8 and BF16 precision modes?
Nvidia was asked about higher-precision data types like FP8 and BF16, but support details were not confirmed in the source.

Has Nvidia changed how it counts GPUs in rack names?
Nvidia decided to retain the established NVL72 naming convention and will count SXM modules as GPUs rather than per-die counts.

SYSTEMS Every conference is an AI conference as Nvidia unpacks its Vera Rubin CPUs and GPUs at CES Teasing the next generation earlier than usual Tobias Mann Mon 5 Jan 2026 // 22:39 UTC…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *