Why binaries are hard
Git's storage model is designed for text. Each commit stores complete object snapshots (deduplicated by content hash). Text files compress and delta well; binaries usually do not. Edit a 50 MB image once and you have added 50 MB to the repo. Edit it ten times and history is nearly half a gigabyte.
Diff and merge
Git cannot diff binaries by default. Marking a file as binary in .gitattributes makes that explicit:
# .gitattributes
*.pdf binary
*.png binary
*.zip binary
Tools like git diff will simply report "Binary files differ".
Custom diff drivers
For some binaries, custom diff drivers can produce useful output:
# .gitattributes
*.docx diff=docx
*.pdf diff=pdf
# ~/.gitconfig
[diff "docx"]
textconv = pandoc --to=plain
[diff "pdf"]
textconv = pdftotext -layout
Now git diff on a Word document shows actual prose changes.
Merge drivers
Some binary formats - SQLite, JSON-with-sorted-keys, sometimes EPUB - can be merged with custom drivers. For most images, audio, and video, "merge" is meaningless; mark them as merge=ours or merge=binary to suppress nonsense conflict markers.
Reducing repo bloat
If binaries must be versioned:
- Use Git LFS to store them outside pack files.
- Or generate them at build time and ignore the source.
- Or store them in object storage with a manifest in Git.
Tracking compiled artefacts
The general rule: do not commit derived artefacts. JARs, dist bundles, compiled images from SVG sources - these belong in CI output, not Git. Add them to .gitignore.
# .gitignore
dist/
build/
*.jar
*.class
When binaries are essential
Some projects genuinely need binaries: design tools, audio, ML models, signed certificates, compiled vendor libraries. For these:
git lfs install
git lfs track "*.psd"
git lfs track "models/*.bin"
git add .gitattributes
git commit -m "Track design and model files via LFS"
Detecting binaries already in history
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
awk '/^blob/ {print $3, $4}' | \
sort -n | tail -20
This lists the 20 largest blobs ever committed. Often-edited binaries dominate.
Cleaning history
If you accidentally committed large binaries, git filter-repo removes them from history:
git filter-repo --invert-paths --path bigfile.psd
This rewrites every commit; force-push and coordinate with the team.
Locking binary files
Binaries cannot be merged, so concurrent edits are dangerous. LFS supports file locking:
git lfs lock design/logo.psd
# someone else cannot push changes to this path
git lfs unlock design/logo.psd
This is the closest Git comes to Perforce-style exclusive checkouts. For teams with heavy binary editing, locks prevent painful overwrite conflicts.