TL;DR
Fast Containers is a header-only C++23 library that implements a cache-friendly B+tree plus several hugepage-aware allocators optimized for x86-64 with AVX2. The author reports multi-fold speedups over Abseil's btree and std::map for large trees, with caveats about workload, hardware and benchmark scope.
What happened
A new open-source project, Fast Containers, provides a header-only B+tree implementation and multiple hugepage-oriented allocators for C++23 targeted at x86-64 CPUs with AVX2. The B+tree supports SIMD-accelerated node search, tunable node sizes, and integrates with pooling allocators designed to reduce TLB misses and allocation overhead. The repository includes a fixed-size Dense Map used inside nodes, a single-size HugePageAllocator, a MultiSizeHugePageAllocator for variable-sized allocations, and a PolicyBasedHugePageAllocator for shared pools. The author reports benchmarks (focused on 10 million-element trees with specific key/value sizes) showing 2–5× faster insert/find/erase than Abseil's btree and std::map, plus further gains from hugepages and modest node-search speedups from AVX2. The project is described as a work in progress, tested on Linux x86-64 with AVX2; Windows builds and broader platform testing have not been reported.
Why it matters
- Reduces TLB misses and allocation overhead for large in-memory trees using hugepage-aware pooling allocators.
- SIMD-accelerated node searches can shave per-node lookup latency on AVX2-capable x86-64 hardware.
- Tunable node sizes let users optimize cache utilization for different key/value footprints.
- If benchmark results hold in real workloads, it could offer sizable throughput improvements for large keyed containers compared with common implementations.
Key facts
- Components included: btree (kressler::fast_containers::btree), Dense Map, HugePageAllocator, MultiSizeHugePageAllocator, PolicyBasedHugePageAllocator.
- Reported performance: 2–5× faster insert/find/erase versus Abseil btree and std::map for large trees in provided benchmarks.
- SIMD (AVX2) search yields ~3–10% faster node searches according to the author.
- Hugepage allocator integration is credited with 3–5× speedups by reducing TLB misses and pooling allocations.
- Benchmarks focus on 10 million element trees and specific tested configurations (8-byte keys; 32-byte and 256-byte values).
- Default heuristics auto-compute leaf/internal node sizes to target roughly 2KB nodes; manual tuning is supported (leaf sizes clamped to [8,64]).
- Project prerequisites: C++23 compiler (GCC 14+ or Clang 19+), CMake 3.30+, AVX2-capable CPU (Intel Haswell or AMD Excavator or newer).
- API is modeled after std::map and provides insert, emplace, operator[], find, erase, iteration, size and other standard operations.
- Repository and quick-start integration: add as git submodule and link via CMake; headers exposed for direct inclusion.
What to watch next
- Further benchmark coverage for smaller tree sizes and wider key/value configurations (not confirmed in the source).
- Cross-platform builds and explicit Windows support—source says Windows has not been tested (not confirmed in the source).
- Ongoing cleanup and releases: the author characterizes the codebase as work in progress and is actively tidying the implementation.
- Broader independent comparisons against other container implementations and real-world workloads (not confirmed in the source).
Quick glossary
- B+tree: A balanced tree data structure that stores keys in sorted order and places records in leaf nodes, commonly used for efficient range queries and ordered maps.
- Hugepage: A large memory page size provided by the OS that can reduce TLB pressure and improve performance for large memory allocations.
- SIMD (Single Instruction, Multiple Data): A class of CPU instructions that operate on multiple data elements in parallel, used to accelerate data-parallel operations.
- TLB (Translation Lookaside Buffer): A cache that stores recent virtual-to-physical address translations; misses can be costly for memory-heavy workloads.
- Allocator: A component that manages memory allocation and deallocation for containers and objects, affecting performance and memory layout.
Reader FAQ
How much faster is this B+tree?
The author reports 2–5× faster insert/find/erase than Abseil's btree and std::map for large trees in their benchmarks; exact gains depend on workload and configuration.
Which platforms are supported?
The library is built and tested on Linux x86-64 with AVX2; Windows builds have not been tested according to the source.
Is this production-ready?
The project is described as a work in progress; the author is cleaning up the implementation and has no planned major changes to the B+tree, but production readiness is not explicitly asserted.
How do I integrate it into my project?
Add the repository as a git submodule, add_subdirectory in CMake, link fast_containers::fast_containers, and include the provided headers.
Fast Containers High-performance header-only container library for C++23 on x86-64. What's Included B+Tree (kressler::fast_containers::btree) – Cache-friendly B+tree with SIMD search and hugepage support Dense Map (kressler::fast_containers::dense_map) – Fixed-size sorted array…
Sources
- High-performance header-only container library for C++23 on x86-64
- High-performance header-only container library for C++23 …
- A curated list of awesome header-only C++ libraries
- A list of open-source C++ libraries
Related posts
- DDL to Data — paste SQL CREATE TABLEs to instantly generate realistic test data
- Prism.Tools — Free and privacy-focused developer utilities collection
- New web tool aims to simplify Unix epoch conversions — is it the best?