TL;DR

BusterMQ is a NATS-compatible server implemented in Zig that uses a thread-per-core model and Linux io_uring. Local benchmarks on a 16-core Ryzen system show substantially higher throughput and lower latency than a Go NATS build in the tested fan-out scenario.

What happened

The BusterMQ project published detailed local benchmarks for a fan-out workload run on an AMD Ryzen 9 9950X (16 cores). The test used 10 publishers, 100 subscribers distributed as 10 per topic across 10 topics, sending 50 million messages with 128-byte payloads on localhost. Configurations exercised included a default io_uring mode (STANDARD), a spin-loop polling mode (+BUSYPOLL), shard-aware routing (+ROUTE), and a combined configuration (+ROUTE+BUSYPOLL). Across those variants the combined +ROUTE+BUSYPOLL configuration produced the highest publish rate (6.30M msgs/sec), highest delivery rate (58.74M msgs/sec) and greatest bandwidth (8.20 GB/s). Median latency was lowest in the +ROUTE run, while tail latencies (p99/p99.9) were smallest in the +BUSYPOLL run. A Go NATS build included as a comparison ran significantly slower in these tests. The project notes more benchmarks are forthcoming.

Why it matters

  • The results suggest a Zig + io_uring implementation can achieve higher throughput and lower latency than the Go-based comparison in this local fan-out workload.
  • Thread-per-core and shard-aware routing choices appear to influence both median and tail latency, offering knobs for performance tuning.
  • io_uring and busy-polling modes show measurable differences in tail latency, which matters for latency-sensitive message systems.

Key facts

  • Software: BusterMQ — described as a thread-per-core NATS server written in Zig using io_uring.
  • Benchmark: Fan-out test with 10 publishers, 100 subscribers (10 per topic), 10 topics, 50 million messages, 128-byte payload.
  • Hardware: AMD Ryzen 9 9950X (16 cores), tests run on localhost.
  • Best throughput: +ROUTE+BUSYPOLL achieved 6.30M publishes/sec and 58.74M deliveries/sec.
  • Best bandwidth: +ROUTE+BUSYPOLL reported 8.20 GB/s.
  • Go NATS comparison: publish rate 2.62M msgs/sec and delivery rate 25.53M msgs/sec in the same test.
  • Median and tail latencies varied by configuration: +ROUTE had the lowest p50 (6.16 ms), +BUSYPOLL had the lowest p99/p99.9 (13.07 ms / 14.33 ms).
  • Test duration: roughly 8.45–9.51 seconds for BusterMQ variants; Go NATS test ran longer at 19.58 seconds.
  • Project note: the page states 'More benchmarks incoming.'

What to watch next

  • More benchmark results from the project — confirmed in the source.
  • How multi-node or networked (non-localhost) performance compares to these local results — not confirmed in the source.
  • Production readiness, stability, and operational characteristics under sustained load — not confirmed in the source.

Quick glossary

  • NATS: A lightweight, high-performance messaging system used for cloud-native applications and microservices communication.
  • io_uring: A Linux kernel interface for asynchronous I/O that can reduce system call overhead and improve I/O throughput and latency.
  • Zig: A general-purpose programming language focused on performance, safety, and predictable binaries.
  • Fan-out: A messaging pattern where a single published message is delivered to multiple subscribers.
  • Busy-poll (spin-loop): A polling mode where a thread repeatedly checks for events instead of yielding, trading CPU for lower wake latency.

Reader FAQ

What is BusterMQ?
A NATS server implementation that uses a thread-per-core design, written in Zig and using io_uring, according to the project page.

How did BusterMQ perform in the benchmarks?
In the published fan-out test on a 16-core Ryzen, the best BusterMQ configuration reached 6.30M publishes/sec and 58.74M deliveries/sec, outperforming the included Go NATS build in the same test.

What workload was used for testing?
A local fan-out workload: 10 publishers, 100 subscribers (10 per topic), 10 topics, 50 million messages with 128-byte payloads.

Is BusterMQ production-ready?
not confirmed in the source

DETAILED BENCHMARKS. Fan-out benchmark: 10 publishers, 100 subscribers (10 per topic), 10 topics, 50M messages, 128-byte payload. AMD Ryzen 9 9950X (16 cores), localhost. More benchmarks incoming. METRIC STANDARD +BUSYPOLL…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *