TL;DR
BusterMQ is a NATS-compatible server implemented in Zig that uses a thread-per-core model and Linux io_uring. Local benchmarks on a 16-core Ryzen system show substantially higher throughput and lower latency than a Go NATS build in the tested fan-out scenario.
What happened
The BusterMQ project published detailed local benchmarks for a fan-out workload run on an AMD Ryzen 9 9950X (16 cores). The test used 10 publishers, 100 subscribers distributed as 10 per topic across 10 topics, sending 50 million messages with 128-byte payloads on localhost. Configurations exercised included a default io_uring mode (STANDARD), a spin-loop polling mode (+BUSYPOLL), shard-aware routing (+ROUTE), and a combined configuration (+ROUTE+BUSYPOLL). Across those variants the combined +ROUTE+BUSYPOLL configuration produced the highest publish rate (6.30M msgs/sec), highest delivery rate (58.74M msgs/sec) and greatest bandwidth (8.20 GB/s). Median latency was lowest in the +ROUTE run, while tail latencies (p99/p99.9) were smallest in the +BUSYPOLL run. A Go NATS build included as a comparison ran significantly slower in these tests. The project notes more benchmarks are forthcoming.
Why it matters
- The results suggest a Zig + io_uring implementation can achieve higher throughput and lower latency than the Go-based comparison in this local fan-out workload.
- Thread-per-core and shard-aware routing choices appear to influence both median and tail latency, offering knobs for performance tuning.
- io_uring and busy-polling modes show measurable differences in tail latency, which matters for latency-sensitive message systems.
Key facts
- Software: BusterMQ — described as a thread-per-core NATS server written in Zig using io_uring.
- Benchmark: Fan-out test with 10 publishers, 100 subscribers (10 per topic), 10 topics, 50 million messages, 128-byte payload.
- Hardware: AMD Ryzen 9 9950X (16 cores), tests run on localhost.
- Best throughput: +ROUTE+BUSYPOLL achieved 6.30M publishes/sec and 58.74M deliveries/sec.
- Best bandwidth: +ROUTE+BUSYPOLL reported 8.20 GB/s.
- Go NATS comparison: publish rate 2.62M msgs/sec and delivery rate 25.53M msgs/sec in the same test.
- Median and tail latencies varied by configuration: +ROUTE had the lowest p50 (6.16 ms), +BUSYPOLL had the lowest p99/p99.9 (13.07 ms / 14.33 ms).
- Test duration: roughly 8.45–9.51 seconds for BusterMQ variants; Go NATS test ran longer at 19.58 seconds.
- Project note: the page states 'More benchmarks incoming.'
What to watch next
- More benchmark results from the project — confirmed in the source.
- How multi-node or networked (non-localhost) performance compares to these local results — not confirmed in the source.
- Production readiness, stability, and operational characteristics under sustained load — not confirmed in the source.
Quick glossary
- NATS: A lightweight, high-performance messaging system used for cloud-native applications and microservices communication.
- io_uring: A Linux kernel interface for asynchronous I/O that can reduce system call overhead and improve I/O throughput and latency.
- Zig: A general-purpose programming language focused on performance, safety, and predictable binaries.
- Fan-out: A messaging pattern where a single published message is delivered to multiple subscribers.
- Busy-poll (spin-loop): A polling mode where a thread repeatedly checks for events instead of yielding, trading CPU for lower wake latency.
Reader FAQ
What is BusterMQ?
A NATS server implementation that uses a thread-per-core design, written in Zig and using io_uring, according to the project page.
How did BusterMQ perform in the benchmarks?
In the published fan-out test on a 16-core Ryzen, the best BusterMQ configuration reached 6.30M publishes/sec and 58.74M deliveries/sec, outperforming the included Go NATS build in the same test.
What workload was used for testing?
A local fan-out workload: 10 publishers, 100 subscribers (10 per topic), 10 topics, 50 million messages with 128-byte payloads.
Is BusterMQ production-ready?
not confirmed in the source
DETAILED BENCHMARKS. Fan-out benchmark: 10 publishers, 100 subscribers (10 per topic), 10 topics, 50M messages, 128-byte payload. AMD Ryzen 9 9950X (16 cores), localhost. More benchmarks incoming. METRIC STANDARD +BUSYPOLL…
Sources
- Show HN: BusterMQ, Thread-per-core NATS server in Zig with io_uring
- Hacker News
- freshnews – fresh tech news from around the web
- hckr news – Hacker News sorted by time
Related posts
- NERD: A terse, machine-optimized programming language for LLM-authored code
- NERD: A Programming Language Built for AI Authors, Not Human Readers
- GoGoGrandparent (YC S16) Seeks Remote Full‑Stack Tech Leads — $100K–$200K