Finding big files in history

By admin , 29 April 2026

Why size matters

Repository size affects clone speed, fetch time, CI cost, and IDE responsiveness. A single 200 MB binary committed once and removed is still in your repo forever. Finding it is the first step to slimming down.

The classic one-liner

git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  awk '/^blob/ {print $3, $4}' | \
  sort -n | tail -20

This lists the 20 largest blobs ever stored, with size and path. The output reveals candidates for cleanup.

Size by directory

git ls-tree -r -l HEAD | \
  awk '{ size[$5] += $4 } END { for (f in size) print size[f], f }' | \
  sort -rn | head -20

Largest paths in the current tree. Useful for understanding live size, not historical bloat.

Pack file inspection

git verify-pack -v .git/objects/pack/pack-*.idx | \
  sort -k 3 -n | tail -20

This shows the largest objects in pack files. The path is not included - cross-reference SHAs against git rev-list --objects --all.

Using `git-sizer`

The dedicated tool gives a complete report:

brew install git-sizer
git-sizer --verbose

Output includes commit count, blob count, max blob size, max path depth, and warnings about anomalies. Run it once on any large repo - the report is illuminating.

Tracing a specific blob

Found a huge blob - which commits introduced it?

git rev-list --all --objects | grep <sha>
git log --all --find-object=<sha>

--find-object lists commits that introduced or removed the blob.

Removing big files from history

Once identified, use git filter-repo (the modern replacement for filter-branch):

pip install git-filter-repo
git filter-repo --strip-blobs-bigger-than 50M
# or by path:
git filter-repo --invert-paths --path huge.psd

This rewrites every commit; SHAs change. Force-push afterwards and have collaborators reclone.

Verifying the cleanup

git gc --aggressive --prune=now
du -sh .git
git-sizer

The .git directory should shrink dramatically.

Preventing recurrence

Add a pre-receive or pre-commit hook that rejects large additions:

#!/usr/bin/env bash
MAX=10000000
git diff --cached --numstat | while read added removed file; do
  size=$(git cat-file -s :$file 2>/dev/null || echo 0)
  if [ "$size" -gt "$MAX" ]; then
    echo "Refusing to commit $file: $size bytes" >&2
    exit 1
  fi
done

The takeaway

Run git-sizer on every long-lived repo at least once a year. Catch large additions before they become history; clean them up if they slip through. A 50 MB repo and a 5 GB repo behave very differently in clone, CI, and editor performance.