Perche la dimensione conta
La dimensione del repository influisce su velocita di clone, tempo di fetch e costo CI.
Il classico one-liner
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
awk '/^blob/ {print $3, $4}' | \
sort -n | tail -20
Dimensione per directory
git ls-tree -r -l HEAD | \
awk '{ size[$5] += $4 } END { for (f in size) print size[f], f }' | \
sort -rn | head -20
Ispezione file pack
git verify-pack -v .git/objects/pack/pack-*.idx | \
sort -k 3 -n | tail -20
Usare git-sizer
brew install git-sizer
git-sizer --verbose
Tracciare un blob specifico
git rev-list --all --objects | grep <sha>
git log --all --find-object=<sha>
Rimuovere file grandi dalla storia
pip install git-filter-repo
git filter-repo --strip-blobs-bigger-than 50M
git filter-repo --invert-paths --path huge.psd
Verificare la pulizia
git gc --aggressive --prune=now
du -sh .git
git-sizer
Prevenire ricadute
#!/usr/bin/env bash
MAX=10000000
git diff --cached --numstat | while read added removed file; do
size=$(git cat-file -s :$file 2>/dev/null || echo 0)
if [ "$size" -gt "$MAX" ]; then
echo "Refusing to commit $file: $size bytes" >&2
exit 1
fi
done