By admin , 29 April 2026

Why binaries are hard

Git's storage model is designed for text. Each commit stores complete object snapshots (deduplicated by content hash). Text files compress and delta well; binaries usually do not. Edit a 50 MB image once and you have added 50 MB to the repo. Edit it ten times and history is nearly half a gigabyte.

Diff and merge

Git cannot diff binaries by default. Marking a file as binary in .gitattributes makes that explicit:

# .gitattributes
*.pdf binary
*.png binary
*.zip binary

Tools like git diff will simply report "Binary files differ".

Custom diff drivers

For some binaries, custom diff drivers can produce useful output:

# .gitattributes
*.docx diff=docx
*.pdf diff=pdf

# ~/.gitconfig
[diff "docx"]
    textconv = pandoc --to=plain
[diff "pdf"]
    textconv = pdftotext -layout

Now git diff on a Word document shows actual prose changes.

Merge drivers

Some binary formats - SQLite, JSON-with-sorted-keys, sometimes EPUB - can be merged with custom drivers. For most images, audio, and video, "merge" is meaningless; mark them as merge=ours or merge=binary to suppress nonsense conflict markers.

Reducing repo bloat

If binaries must be versioned:

  • Use Git LFS to store them outside pack files.
  • Or generate them at build time and ignore the source.
  • Or store them in object storage with a manifest in Git.

Tracking compiled artefacts

The general rule: do not commit derived artefacts. JARs, dist bundles, compiled images from SVG sources - these belong in CI output, not Git. Add them to .gitignore.

# .gitignore
dist/
build/
*.jar
*.class

When binaries are essential

Some projects genuinely need binaries: design tools, audio, ML models, signed certificates, compiled vendor libraries. For these:

git lfs install
git lfs track "*.psd"
git lfs track "models/*.bin"
git add .gitattributes
git commit -m "Track design and model files via LFS"

Detecting binaries already in history

git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  awk '/^blob/ {print $3, $4}' | \
  sort -n | tail -20

This lists the 20 largest blobs ever committed. Often-edited binaries dominate.

Cleaning history

If you accidentally committed large binaries, git filter-repo removes them from history:

git filter-repo --invert-paths --path bigfile.psd

This rewrites every commit; force-push and coordinate with the team.

Locking binary files

Binaries cannot be merged, so concurrent edits are dangerous. LFS supports file locking:

git lfs lock design/logo.psd
# someone else cannot push changes to this path
git lfs unlock design/logo.psd

This is the closest Git comes to Perforce-style exclusive checkouts. For teams with heavy binary editing, locks prevent painful overwrite conflicts.