LFS is not the only answer
Git LFS solves one problem - keeping binaries out of pack files - but introduces others: extra infrastructure, hosting fees, and a soft dependency on a server that must be online for clones. Several alternatives suit different scenarios.
Object storage with manifests
Store the binaries in S3, GCS, or any object store; commit only a manifest with hashes and URLs.
# assets.json
{
"logo.psd": {
"sha256": "8a1b...",
"url": "s3://my-bucket/assets/8a1b.psd"
}
}
# fetch script
aws s3 cp s3://my-bucket/assets/8a1b.psd ./logo.psd
Pros: no LFS server, cheap storage, full control. Cons: you write the tooling.
git-annex
git-annex manages large files via symlinks pointing to content-addressed storage. It is older than LFS, supports multiple backends, and works without a central server.
git annex init
git annex add big-file.psd
git commit -m "Add big-file via annex"
git annex copy --to=cloud-remote
DVC for ML datasets
Data Version Control (dvc) layers on Git for tracking large datasets and ML models. It uses object stores as remotes and integrates with pipelines.
dvc init
dvc add data/raw
git add data/raw.dvc .gitignore
git commit -m "Track raw dataset with DVC"
dvc push
Submodules pointing to a binary repo
Keep binaries in a separate Git repository and add it as a submodule. Crude but works for small teams:
git submodule add https://example.com/assets-repo assets
git commit -m "Add assets submodule"
The asset repo can use LFS independently, or just be small.
Build-time generation
Many "large files" are derived. PNGs from SVGs, dataset shards from a source CSV, compiled WASM from Rust - generate them at build time instead of committing them.
npm run build:assets # produces dist/ from src/
# .gitignore: dist/
Sparse checkout for partial clones
If only some directories contain binaries, sparse-checkout lets a clone skip them:
git clone --filter=blob:none --sparse <url>
cd repo
git sparse-checkout set src docs # exclude assets/
Choosing
- Few binaries, occasional updates → object storage manifest.
- Many binaries, deep version history → git-annex or LFS.
- Datasets and ML models → DVC.
- Generated content → build pipeline, not Git.
The right answer depends on team size, host quotas, and whether files are genuinely "source" or just artefacts. The wrong answer is always "commit the binaries to plain Git".