Redd-Archiver: Self-host Reddit archives — 2.38B posts, offline-ready

TL;DR

Redd-Archiver is an open-source tool that converts compressed platform dumps into browsable, offline HTML archives or searchable Docker-backed sites with PostgreSQL full-text search. The v1.0 release supports Reddit Pushshift .zst dumps (2.38B posts through Dec 31, 2024), Voat and Ruqqus archives, an MCP server for AI integrations, and multiple deployment modes including Tor hidden services.

What happened

A new release of Redd-Archiver (v1.0) packages tooling to turn large compressed link-aggregator dumps into navigable archives that can be run locally, hosted publicly, or exposed on Tor. The project accepts multiple input formats (Pushshift .zst JSON Lines for Reddit, Voat SQL, Ruqqus .7z) and emits static HTML for fully offline browsing or a Docker + PostgreSQL deployment for sub-second full-text search. Version 1.0 adds a REST API with 30+ endpoints, an MCP server auto-generating OpenAPI-backed tools for AI assistants, and a PostgreSQL-backed ingestion pipeline designed to keep memory usage constant during large imports. The repository includes quick-start paths for local, Tor, and HTTPS deployments, guidance on resource requirements, and a public registry/leaderboard for discovered instances and coordinated archiving.

Why it matters

Preserves large swaths of public discussions that might otherwise disappear when communities or platforms shut down.
Provides offline, JavaScript-free access to archived posts and full comment threads for low-bandwidth or air-gapped environments.
Enables researchers and tools to run full-text search and programmatic queries over aggregated datasets via a REST API.
Supports private or anonymous sharing through Tor hidden services, lowering hosting and networking requirements for operators.

Key facts

Supports Reddit Pushshift .zst JSON Lines (listed as full support for 2.38B posts, 40,029 subreddits, through Dec 31, 2024).
Also supports Voat SQL dumps (3.81M posts, 24.1M comments) and Ruqqus .7z JSON Lines (500K posts).
Redd-Archiver reports tracking 2.384 billion posts across 68,883 communities when combining supported archives.
Version 1.0 includes a PostgreSQL-backed architecture, GIN-indexed full-text search, a REST API with 30+ endpoints, and an MCP server for AI integration.
Generates static HTML for browse-only offline archives; Docker + PostgreSQL unlocks server-side search and sub-second results.
Design is JavaScript-free for core functionality, mobile-first and WCAG-compliant for accessibility.
Deployment options include local/homelab, static hosting, Docker with HTTPS, and Docker with Tor hidden services; setup times range from minutes to ~15 minutes depending on mode.
Installation prerequisites listed: Python 3.7+, PostgreSQL 12+, and disk space roughly 1.5–2x the input .zst file for the database.

What to watch next

Expansion of the public instance registry and leaderboard as more archives are deployed — not confirmed in the source
Submission and support for additional platforms (examples listed: Lemmy, Hacker News, alternative Reddit archives) — source invites new data sources
Operational and legal considerations for large public mirrors and hosted archives — not confirmed in the source

Quick glossary

Pushshift: A commonly used archival dataset format and project that provides historical Reddit data exports, often distributed as compressed JSON Lines files.
.zst: A file compression format (Zstandard) that provides high compression ratios and fast decompression, commonly used for large dataset dumps.
PostgreSQL full-text search (FTS): A database feature that tokenizes and indexes textual content to allow keyword search, ranking, and filtering within PostgreSQL.
Tor hidden service: A .onion-accessible server reachable over the Tor network that can host services without exposing a public IP or requiring port forwarding.
MCP server: A modular server component that exposes API tools and integrations for machine-assistant workflows; here it auto-generates OpenAPI-backed tools for AI assistants.

Reader FAQ

Can I browse archives offline without a server?
Yes. Redd-Archiver can generate static HTML files that are fully browsable without any server or JavaScript.

How do I enable full-text search?
Search requires running the Docker deployment with PostgreSQL (v12+) and GIN-indexed full-text search; it is not available in static offline mode.

Does the project support Tor hosting?
Yes. The project documents a Docker + Tor deployment option to expose an archive as a Tor hidden service.

Is the archive updated automatically from live platforms?
not confirmed in the source

Are legal or privacy risks addressed?
not confirmed in the source

Redd-Archiver Transform compressed data dumps into browsable HTML archives with flexible deployment options. Redd-Archiver supports offline browsing via sorted index pages OR full-text search with Docker deployment. Features mobile-first design,…

Redd-Archiver: Self-host Reddit archives — 2.38B posts, offline-ready

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Watch the Lego Smart Brick demo: a 15-minute immersive CES walkthrough

Ayder: HTTP-native durable event log in C — curl as the client

FastScheduler: Decorator-first Python task scheduler with async support

Leave a Reply Cancel reply

You missed

Best Noise-Canceling Headphones to Buy Right Now: Top Picks and Why

Anthropic reorganizes C-suite to grow its internal Labs incubator

The best phone to buy right now — top practical picks and buyers’ guide

Senate passes a bill that would let nonconsensual deepfake victims sue