TL;DR

Many package managers initially used Git repositories to store package metadata and versions because it seemed convenient, but scaling and operational problems have forced several to adopt HTTP-based registries, proxies or CDNs. Persistent issues include slow clones and updates, CI inefficiencies, hosting limits, and architecture constraints that make Git a poor fit as a database.

What happened

Over the past several years multiple package ecosystems discovered that using Git as the primary store for registries and metadata creates scaling and operational pain. Cargo moved from full-index clones to a sparse HTTP protocol (RFC 2789) so clients fetch only the files they need; by April 2025 nearly all crates.io requests came from Cargo versions with sparse enabled. Homebrew shifted tap updates to JSON in Homebrew 4.0.0 (Feb 2023) to avoid expensive git fetches and reduced auto-update frequency. CocoaPods abandoned git for most users in version 1.8, switching to a CDN to serve podspecs and cutting download time and disk usage. Some projects cannot easily escape Git: Nixpkgs is an 83 GB repository on GitHub that has strained hosting and risked becoming read-only, while vcpkg relies on git tree hashes for reproducible ports and still requires full history. Go chose proxies and a checksum database (GOPROXY and sumdb) to avoid cloning repositories just to read module manifests.

Why it matters

  • Using full Git histories slows local development and continuous integration, causing long fetch and clone operations that waste CI minutes.
  • Hosting large git repositories can trigger provider limits or special handling, creating operational costs and potential availability risks.
  • Architectures that depend on git history (like vcpkg’s tree-hash versioning) are hard to reconcile with shallow clones and fast checkouts.
  • HTTP-based registries, proxies or CDNs can dramatically reduce latency and bandwidth, improving reproducibility and performance.
  • Relying on Git exposes projects to filesystem and VCS constraints (case-sensitivity, path limits, directory scaling) that databases avoid.

Key facts

  • Cargo’s crates.io index started as a git repo; RFC 2789 introduced a sparse HTTP protocol to fetch only needed metadata.
  • By April 2025, 99% of crates.io requests came from Cargo versions where sparse mode was the default.
  • Homebrew 4.0.0 (Feb 2023) switched tap updates to JSON; some users previously downloaded ~331 MB to unshallow homebrew-core and .git folders neared 1 GB.
  • CocoaPods 1.8 defaulted to serving podspecs via a CDN for most users, saving roughly 1 GB of disk space for new setups and making installs near-instant.
  • Nixpkgs totaled about 83 GB on GitHub with ~500,000 tree objects and roughly 20,000 forks; a local clone is around 2.5 GB, and GitHub warned in Nov 2025 about maintenance failures that could risk a read-only state.
  • vcpkg uses git tree hashes to version ports; its curated registry hosts over 2,000 libraries and requires full history for reproducible lookups, breaking with shallow clones.
  • Go introduced GOPROXY as the default in Go 1.13 and a checksum database to serve module archives and protect against silent history changes.
  • Git-based CMS, wikis and GitOps tools also encounter scale problems: Gollum-based wikis become slow, Decap projects can hit GitHub API limits, and ArgoCD repo servers can exhaust disk when cloning large repos.
  • Common underlying limitations include directory scaling issues, case-sensitivity mismatches, OS path length restrictions, and lacking database features such as indexes, constraints, and transactional migrations.

What to watch next

  • Whether vcpkg will propose a non-git registry or an HTTP-backed solution for reproducible port versioning (vcpkg has not announced such a migration).
  • The outcome of Nixpkgs’ ongoing hosting challenges and any mitigation steps after GitHub’s Nov 2025 warning about maintenance jobs and replica consensus.
  • GitLab’s plans to move away from Gollum and how other Git-hosted CMS/wikis evolve to avoid API and performance limits.

Quick glossary

  • Git: A distributed version control system primarily designed to track changes to files and coordinate work across developers.
  • Registry: A centralized (or distributed) service that publishes package metadata and versions so package managers can discover and download dependencies.
  • CDN (Content Delivery Network): A geographically distributed network of servers that delivers content like files or metadata over HTTP to reduce latency and load on origin servers.
  • Sparse protocol: An approach where clients request only the subset of repository files they need over HTTP instead of cloning the full Git history.
  • Checksum database (sumdb): A service that records cryptographic hashes of package contents to ensure integrity and detect tampering or history rewrites.

Reader FAQ

Why did many package managers use Git in the first place?
Git offered built-in history, a familiar pull-request workflow, distributed copies and free hosting on platforms like GitHub, making it an attractive initial choice.

Can switching to HTTP or CDNs solve the problems?
In many cases yes—Cargo, Homebrew, CocoaPods and Go improved performance by serving metadata or archives over HTTP, but not every project can move (Nixpkgs and vcpkg have structural constraints).

Are there simple workarounds for shallow-clone CI environments?
Workarounds include performing full clones (fetch-depth: 0) or using time-based shallow fetches (e.g., –shallow-since), but these are imperfect and can be brittle.

Is Git being abandoned as a tool?
not confirmed in the source

Package managers keep using git as a database, it never works out Dec 24, 2025 Using git as a database is a seductive idea. You get version history for free….

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *