TL;DR
FuriosaAI introduced the NXT RNGD Server, a turnkey inference system built around its RNGD accelerators and designed to run on standard PCIe infrastructure and air-cooled racks. The rack-friendly 3 kW system ships with Furiosa SDK and LLM runtime preinstalled, and Furiosa cites validation from LG AI Research running EXAONE models.
What happened
FuriosaAI announced the NXT RNGD Server, a branded, ready-to-deploy system aimed at data-center scale AI inference. The server supports up to eight RNGD accelerator cards alongside dual AMD EPYC CPUs and is optimized to operate over standard PCIe interconnects, avoiding proprietary fabrics. Furiosa ships the system with its SDK and Furiosa LLM runtime installed and reports the design runs on 3 kW of system power with air cooling, allowing deployment in facilities that lack liquid-cooling infrastructure. Hardware highlights include 384 GB of HBM3 with 12 TB/s bandwidth, 1 TB of DDR5 system memory, several NVMe storage devices, and dual 25G data NICs. Furiosa cites a customer validation from LG AI Research: a single server with four RNGD cards delivered 60 tokens/sec for EXAONE 3.5 32B at a 4K context and 50 tokens/sec at a 32K context. The company is taking inquiries and orders for January 2026 delivery.
Why it matters
- Lower power draw and air-cooled design target the majority of existing data centers that operate at 8 kW per rack or less, potentially avoiding costly retrofits.
- Preinstalled software stack (Furiosa SDK and LLM runtime) aims to shorten time from installation to serving production models.
- On-prem deployment option lets organizations keep model weights and inference inside their own infrastructure, relevant for compliance and privacy-sensitive workloads.
- Claims of improved efficiency and reduced total cost of ownership could reshape procurement choices for inference infrastructure if independently verified.
Key facts
- System power: 3 kW per server; air-cooled with redundant 2,000 W Titanium power supplies.
- Compute: up to 8 RNGD accelerators per server and dual AMD EPYC processors; Furiosa reports 4 petaFLOPS FP8 per server.
- Memory: 384 GB HBM3 with 12 TB/s bandwidth plus 1 TB DDR5 system memory.
- Storage: 2 × 960 GB NVMe M.2 (OS) and 2 × 3.84 TB NVMe U.2 (internal).
- Networking: 1G management NIC and 2 × 25G data NICs.
- Precision formats supported: BF16, FP8, INT8, and INT4.
- Software: ships with Furiosa SDK and Furiosa LLM runtime; native Kubernetes and Helm integration and vLLM compatibility with built-in OpenAI API support.
- Customer validation: LG AI Research reported 60 tokens/sec (4K context) and 50 tokens/sec (32K context) on EXAONE 3.5 32B using a single server with four RNGD cards at batch size 1.
- Availability: Furiosa is accepting inquiries and orders for January 2026 delivery.
What to watch next
- Independent, third-party benchmarks comparing RNGD performance and efficiency against incumbent GPU platforms — not confirmed in the source.
- Broader enterprise adoption beyond initial LG AI Research validation, including deployments across the mentioned sectors (electronics, finance, telecommunications, biotechnology).
- Whether major cloud providers integrate or offer RNGD-based instances — not confirmed in the source.
Quick glossary
- RNGD accelerator: A purpose-built AI inference accelerator developed by FuriosaAI to run neural network workloads more efficiently than general-purpose CPUs.
- HBM3: High-bandwidth memory (third generation) used to provide very high throughput between memory and accelerators for large models and data.
- FP8: An 8-bit floating-point numerical format that reduces memory and compute needs compared with larger floating-point formats while aiming to preserve model accuracy.
- PCIe (Peripheral Component Interconnect Express): A common high-speed interface standard for connecting components like accelerators to host systems without proprietary fabric requirements.
- vLLM: A serving framework for large language models designed to optimize inference throughput and latency; compatibility indicates integration with existing serving ecosystems.
Reader FAQ
When will the NXT RNGD Server be available?
Furiosa is taking inquiries and orders for January 2026 delivery.
Does the server require liquid cooling or rack retrofits?
The server is designed to run at 3 kW and use air cooling to fit most existing, air-cooled data centers.
What software is included?
The system ships with the Furiosa SDK and Furiosa LLM runtime preinstalled, and offers native Kubernetes and Helm integration.
Are head-to-head efficiency claims versus specific GPUs provided?
Not confirmed in the source.
Introducing Furiosa NXT RNGD Server: Efficient AI inference at data center scale News September 25, 2025 Share this article We are excited to introduce FuriosaAI’s NXT RNGD Server—our first branded,…
Sources
Related posts
- Xoscript: a minimalist server-side scripting language revived in 2026
- Digg Relaunches Public Platform Under Kevin Rose and Alexis Ohanian
- Naya Connect: a modular mechanical keyboard for indecisive users