High-performance header-only container library for C++23 on x86-64

TL;DR

Fast Containers is a header-only C++23 library that implements a cache-friendly B+tree plus several hugepage-aware allocators optimized for x86-64 with AVX2. The author reports multi-fold speedups over Abseil's btree and std::map for large trees, with caveats about workload, hardware and benchmark scope.

What happened

A new open-source project, Fast Containers, provides a header-only B+tree implementation and multiple hugepage-oriented allocators for C++23 targeted at x86-64 CPUs with AVX2. The B+tree supports SIMD-accelerated node search, tunable node sizes, and integrates with pooling allocators designed to reduce TLB misses and allocation overhead. The repository includes a fixed-size Dense Map used inside nodes, a single-size HugePageAllocator, a MultiSizeHugePageAllocator for variable-sized allocations, and a PolicyBasedHugePageAllocator for shared pools. The author reports benchmarks (focused on 10 million-element trees with specific key/value sizes) showing 2–5× faster insert/find/erase than Abseil's btree and std::map, plus further gains from hugepages and modest node-search speedups from AVX2. The project is described as a work in progress, tested on Linux x86-64 with AVX2; Windows builds and broader platform testing have not been reported.

Why it matters

Reduces TLB misses and allocation overhead for large in-memory trees using hugepage-aware pooling allocators.
SIMD-accelerated node searches can shave per-node lookup latency on AVX2-capable x86-64 hardware.
Tunable node sizes let users optimize cache utilization for different key/value footprints.
If benchmark results hold in real workloads, it could offer sizable throughput improvements for large keyed containers compared with common implementations.

Key facts

Components included: btree (kressler::fast_containers::btree), Dense Map, HugePageAllocator, MultiSizeHugePageAllocator, PolicyBasedHugePageAllocator.
Reported performance: 2–5× faster insert/find/erase versus Abseil btree and std::map for large trees in provided benchmarks.
SIMD (AVX2) search yields ~3–10% faster node searches according to the author.
Hugepage allocator integration is credited with 3–5× speedups by reducing TLB misses and pooling allocations.
Benchmarks focus on 10 million element trees and specific tested configurations (8-byte keys; 32-byte and 256-byte values).
Default heuristics auto-compute leaf/internal node sizes to target roughly 2KB nodes; manual tuning is supported (leaf sizes clamped to [8,64]).
Project prerequisites: C++23 compiler (GCC 14+ or Clang 19+), CMake 3.30+, AVX2-capable CPU (Intel Haswell or AMD Excavator or newer).
API is modeled after std::map and provides insert, emplace, operator[], find, erase, iteration, size and other standard operations.
Repository and quick-start integration: add as git submodule and link via CMake; headers exposed for direct inclusion.

What to watch next

Further benchmark coverage for smaller tree sizes and wider key/value configurations (not confirmed in the source).
Cross-platform builds and explicit Windows support—source says Windows has not been tested (not confirmed in the source).
Ongoing cleanup and releases: the author characterizes the codebase as work in progress and is actively tidying the implementation.
Broader independent comparisons against other container implementations and real-world workloads (not confirmed in the source).

Quick glossary

B+tree: A balanced tree data structure that stores keys in sorted order and places records in leaf nodes, commonly used for efficient range queries and ordered maps.
Hugepage: A large memory page size provided by the OS that can reduce TLB pressure and improve performance for large memory allocations.
SIMD (Single Instruction, Multiple Data): A class of CPU instructions that operate on multiple data elements in parallel, used to accelerate data-parallel operations.
TLB (Translation Lookaside Buffer): A cache that stores recent virtual-to-physical address translations; misses can be costly for memory-heavy workloads.
Allocator: A component that manages memory allocation and deallocation for containers and objects, affecting performance and memory layout.

Reader FAQ

How much faster is this B+tree?
The author reports 2–5× faster insert/find/erase than Abseil's btree and std::map for large trees in their benchmarks; exact gains depend on workload and configuration.

Which platforms are supported?
The library is built and tested on Linux x86-64 with AVX2; Windows builds have not been tested according to the source.

Is this production-ready?
The project is described as a work in progress; the author is cleaning up the implementation and has no planned major changes to the B+tree, but production readiness is not explicitly asserted.

How do I integrate it into my project?
Add the repository as a git submodule, add_subdirectory in CMake, link fast_containers::fast_containers, and include the provided headers.

Fast Containers High-performance header-only container library for C++23 on x86-64. What's Included B+Tree (kressler::fast_containers::btree) – Cache-friendly B+tree with SIMD search and hugepage support Dense Map (kressler::fast_containers::dense_map) – Fixed-size sorted array…

High-performance header-only container library for C++23 on x86-64

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

LaTeX Coffee Stains (2021 PDF): Notes on the CoffeeStains Package

Butter adds automatic template induction to improve dynamic LLM response caching

Claude Code 2.1.0 version-parsing bug prevents CLI from starting on macOS

Leave a Reply Cancel reply

You missed

AMD teases MI500 GPUs, Helios racks and Venice Epyc to challenge Nvidia

Apple confirms Chase will replace Goldman Sachs as Apple Card issuer

Character.AI and Google Reach Settlements in Teen Suicide, Self-Harm Lawsuits

T-Mobile launches ‘Better Value’ family plans and raises per-line fees in 2026