TL;DR

A recent Ask HN thread asked how people implement retrieval-augmented generation (RAG) locally with minimal dependencies. Commenters recommended a mix of lightweight vector stores, search approaches and bundled tools, and raised practical questions about dataset size and trade-offs.

What happened

A Hacker News Ask HN thread solicited community approaches to running RAG locally for internal codebases and complex documents with minimal external dependencies. Respondents suggested a variety of lightweight options: some recommended managed-style local vector stores such as Qdrant; others pointed to FAISS (faiss-cpu) used with Python pickles for smaller datasets. Several participants named Chroma and an alternate called Opus, while one poster noted LibreChat as a packaged local tool that includes a vector database for documents. A few answers emphasized simpler text search strategies—BM25 or SQLite with FTS5—rather than a full vector stack. Additional mentions included LightRAG and Archestra as UI options and a brief endorsement of the AnythingLLM project. One commenter asked a follow-up about what counts as “too large” for the simpler approaches, indicating ongoing practical concerns about scale and tooling choices.

Why it matters

  • Local RAG setups can reduce reliance on external services and data egress, affecting cost and privacy considerations.
  • Tooling choices vary from specialized vector databases to lightweight search and storage options; each implies different engineering trade-offs.
  • Knowing when to move from simple approaches (pickles, FTS) to dedicated vector stores matters for performance and maintenance.
  • Bundled or turnkey local projects may lower integration effort for teams working with documents or internal code.

Key facts

  • The original post asked how people run RAG locally with minimal dependencies for code or complex documents.
  • Qdrant was recommended by at least one commenter as a simple local option.
  • FAISS (faiss-cpu) plus Python pickling was suggested for datasets described as not too large.
  • Chroma and a project referred to as Opus were mentioned as options to try.
  • LibreChat was noted for bundling a vector database for document workflows.
  • Some contributors recommended lighter alternatives such as BM25-style search instead of vector embeddings.
  • LightRAG and Archestra were referenced in the context of UIs or tooling.
  • AnythingLLM was described by a commenter as a promising option.
  • SQLite with FTS5 was specifically suggested as a local search approach.

What to watch next

  • Whether FAISS-based, pickle-backed approaches remain viable as dataset sizes grow — not confirmed in the source.
  • Adoption and maturity of bundled local tools like LibreChat for enterprise document workflows — not confirmed in the source.
  • How projects like Chroma, Opus or AnythingLLM evolve to balance minimal dependencies with scalability and features — not confirmed in the source.

Quick glossary

  • RAG: Retrieval-augmented generation: combining a retrieval system with a generative model so the model can use external documents or data at inference time.
  • Vector database: A storage system optimized for similarity search over vector embeddings, commonly used to retrieve semantically related documents.
  • FAISS: A library for efficient similarity search and clustering of dense vectors; faiss-cpu is a CPU-only Python package variant.
  • BM25: A classical probabilistic information retrieval ranking function used in text search engines for relevance scoring.
  • FTS5: Full-Text Search extension for SQLite that provides text indexing and query features for local search.

Reader FAQ

What local options did commenters recommend for RAG?
Recommendations in the thread included Qdrant, faiss-cpu (with pickling for small datasets), Chroma, Opus, LibreChat, LightRAG/Archestra, BM25, AnythingLLM and SQLite FTS5.

Is there a consensus on the best approach?
Not confirmed in the source.

What counts as 'too large' for faiss-cpu and pickle?
Not confirmed in the source.

Are there turnkey local projects that include a vector DB?
LibreChat was mentioned as a bundled solution that includes a vector database for documents.

Hacker Newsnew | past | comments | ask | show | jobs | submit login Ask HN: How are you doing RAG locally? 31 points by tmaly 2 hours ago…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *