By admin , 29 April 2026

LFS is not the only answer

Git LFS solves one problem - keeping binaries out of pack files - but introduces others: extra infrastructure, hosting fees, and a soft dependency on a server that must be online for clones. Several alternatives suit different scenarios.

Object storage with manifests

Store the binaries in S3, GCS, or any object store; commit only a manifest with hashes and URLs.

# assets.json
{
  "logo.psd": {
    "sha256": "8a1b...",
    "url": "s3://my-bucket/assets/8a1b.psd"
  }
}

# fetch script
aws s3 cp s3://my-bucket/assets/8a1b.psd ./logo.psd

Pros: no LFS server, cheap storage, full control. Cons: you write the tooling.

git-annex

git-annex manages large files via symlinks pointing to content-addressed storage. It is older than LFS, supports multiple backends, and works without a central server.

git annex init
git annex add big-file.psd
git commit -m "Add big-file via annex"
git annex copy --to=cloud-remote

DVC for ML datasets

Data Version Control (dvc) layers on Git for tracking large datasets and ML models. It uses object stores as remotes and integrates with pipelines.

dvc init
dvc add data/raw
git add data/raw.dvc .gitignore
git commit -m "Track raw dataset with DVC"
dvc push

Submodules pointing to a binary repo

Keep binaries in a separate Git repository and add it as a submodule. Crude but works for small teams:

git submodule add https://example.com/assets-repo assets
git commit -m "Add assets submodule"

The asset repo can use LFS independently, or just be small.

Build-time generation

Many "large files" are derived. PNGs from SVGs, dataset shards from a source CSV, compiled WASM from Rust - generate them at build time instead of committing them.

npm run build:assets    # produces dist/ from src/
# .gitignore: dist/

Sparse checkout for partial clones

If only some directories contain binaries, sparse-checkout lets a clone skip them:

git clone --filter=blob:none --sparse <url>
cd repo
git sparse-checkout set src docs   # exclude assets/

Choosing

  • Few binaries, occasional updates → object storage manifest.
  • Many binaries, deep version history → git-annex or LFS.
  • Datasets and ML models → DVC.
  • Generated content → build pipeline, not Git.

The right answer depends on team size, host quotas, and whether files are genuinely "source" or just artefacts. The wrong answer is always "commit the binaries to plain Git".