TL;DR

Fast Containers is a header-only C++23 library that implements a cache-friendly B+tree plus several hugepage-aware allocators optimized for x86-64 with AVX2. The author reports multi-fold speedups over Abseil's btree and std::map for large trees, with caveats about workload, hardware and benchmark scope.

What happened

A new open-source project, Fast Containers, provides a header-only B+tree implementation and multiple hugepage-oriented allocators for C++23 targeted at x86-64 CPUs with AVX2. The B+tree supports SIMD-accelerated node search, tunable node sizes, and integrates with pooling allocators designed to reduce TLB misses and allocation overhead. The repository includes a fixed-size Dense Map used inside nodes, a single-size HugePageAllocator, a MultiSizeHugePageAllocator for variable-sized allocations, and a PolicyBasedHugePageAllocator for shared pools. The author reports benchmarks (focused on 10 million-element trees with specific key/value sizes) showing 2–5× faster insert/find/erase than Abseil's btree and std::map, plus further gains from hugepages and modest node-search speedups from AVX2. The project is described as a work in progress, tested on Linux x86-64 with AVX2; Windows builds and broader platform testing have not been reported.

Why it matters

  • Reduces TLB misses and allocation overhead for large in-memory trees using hugepage-aware pooling allocators.
  • SIMD-accelerated node searches can shave per-node lookup latency on AVX2-capable x86-64 hardware.
  • Tunable node sizes let users optimize cache utilization for different key/value footprints.
  • If benchmark results hold in real workloads, it could offer sizable throughput improvements for large keyed containers compared with common implementations.

Key facts

  • Components included: btree (kressler::fast_containers::btree), Dense Map, HugePageAllocator, MultiSizeHugePageAllocator, PolicyBasedHugePageAllocator.
  • Reported performance: 2–5× faster insert/find/erase versus Abseil btree and std::map for large trees in provided benchmarks.
  • SIMD (AVX2) search yields ~3–10% faster node searches according to the author.
  • Hugepage allocator integration is credited with 3–5× speedups by reducing TLB misses and pooling allocations.
  • Benchmarks focus on 10 million element trees and specific tested configurations (8-byte keys; 32-byte and 256-byte values).
  • Default heuristics auto-compute leaf/internal node sizes to target roughly 2KB nodes; manual tuning is supported (leaf sizes clamped to [8,64]).
  • Project prerequisites: C++23 compiler (GCC 14+ or Clang 19+), CMake 3.30+, AVX2-capable CPU (Intel Haswell or AMD Excavator or newer).
  • API is modeled after std::map and provides insert, emplace, operator[], find, erase, iteration, size and other standard operations.
  • Repository and quick-start integration: add as git submodule and link via CMake; headers exposed for direct inclusion.

What to watch next

  • Further benchmark coverage for smaller tree sizes and wider key/value configurations (not confirmed in the source).
  • Cross-platform builds and explicit Windows support—source says Windows has not been tested (not confirmed in the source).
  • Ongoing cleanup and releases: the author characterizes the codebase as work in progress and is actively tidying the implementation.
  • Broader independent comparisons against other container implementations and real-world workloads (not confirmed in the source).

Quick glossary

  • B+tree: A balanced tree data structure that stores keys in sorted order and places records in leaf nodes, commonly used for efficient range queries and ordered maps.
  • Hugepage: A large memory page size provided by the OS that can reduce TLB pressure and improve performance for large memory allocations.
  • SIMD (Single Instruction, Multiple Data): A class of CPU instructions that operate on multiple data elements in parallel, used to accelerate data-parallel operations.
  • TLB (Translation Lookaside Buffer): A cache that stores recent virtual-to-physical address translations; misses can be costly for memory-heavy workloads.
  • Allocator: A component that manages memory allocation and deallocation for containers and objects, affecting performance and memory layout.

Reader FAQ

How much faster is this B+tree?
The author reports 2–5× faster insert/find/erase than Abseil's btree and std::map for large trees in their benchmarks; exact gains depend on workload and configuration.

Which platforms are supported?
The library is built and tested on Linux x86-64 with AVX2; Windows builds have not been tested according to the source.

Is this production-ready?
The project is described as a work in progress; the author is cleaning up the implementation and has no planned major changes to the B+tree, but production readiness is not explicitly asserted.

How do I integrate it into my project?
Add the repository as a git submodule, add_subdirectory in CMake, link fast_containers::fast_containers, and include the provided headers.

Fast Containers High-performance header-only container library for C++23 on x86-64. What's Included B+Tree (kressler::fast_containers::btree) – Cache-friendly B+tree with SIMD search and hugepage support Dense Map (kressler::fast_containers::dense_map) – Fixed-size sorted array…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *