By admin , 29 April 2026

Knowing your weight

Before optimizing, measure. Git ships git count-objects for basic stats; the third-party git-sizer from GitHub gives a deeper analysis with thresholds for trouble. Together they tell you whether your repo needs LFS, partial clone, or rewrite.

git count-objects

git count-objects -v
git count-objects -vH               # human-readable
git count-objects --human-readable

Output explains: count of loose objects, size, in-pack, packs, garbage, prune-packable. Loose object count over a few thousand triggers auto-gc.

git-sizer

Install: brew install git-sizer or download a binary. Run from any clone:

git-sizer
git-sizer --verbose
git-sizer --no-progress --threshold 1

It analyzes commits, trees, blobs, references, and reports sizes plus warnings (e.g., "1 file with 800MB, consider Git LFS").

Sample insights

  • Maximum number of files in a tree: warns above 100k.
  • Maximum blob size: warns above 50MB.
  • Total size of all commits: tracks growth.
  • Maximum tag depth: detects pathological tag chains.
  • Total reachable objects: indicates clone time.

Finding big files

git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize:disk) %(rest)' | \
  awk '$1=="blob" {print $3, $4}' | sort -rn | head

This pipeline finds the largest blobs in history. Combine with git log --all --find-object=<sha> to identify when each was introduced.

Action items by finding

  • Big blobs: migrate to Git LFS or rewrite history with filter-repo.
  • Many small files: enable feature.manyFiles, sparse checkout.
  • Many refs: enable protocol v2; consider reftable.
  • Slow walks: write commit-graph with Bloom filters.
  • Slow lookups: enable MIDX with bitmaps.

Common mistakes

Looking only at .git size on disk — packed objects share bytes via deltas, so on-disk size can be misleading. Use git-sizer's logical sizes. Confusing reachable size with total: garbage objects inflate disk but not clone bandwidth.

Tracking growth

git-sizer --json > sizer-$(date +%Y%m%d).json
diff <(jq . sizer-old.json) <(jq . sizer-new.json)

Related

See "Git garbage collection: gc, prune, and pack-refs", "filter-repo: rewriting history safely", and "Recovery and repair of corrupt repositories".