Introduction
Git's content-addressed model deduplicates identical objects, but similar-but-not-identical blobs (a file and its later edit) would still cost full size each. Pack files solve this with delta compression: store one object as a base plus a sequence of "copy/insert" instructions that produce another object.
How it works
When packing, Git compares candidate objects (typically blobs of similar size and type) and chooses a base for each. The delta records:
- The base object (by offset within pack or by SHA).
- Sizes of base and target.
- A stream of copy (use bytes from base) and insert (literal bytes) instructions.
The delta stream is then zlib-compressed.
Inspecting deltas
git verify-pack -v .git/objects/pack/pack-<hash>.idx | head -20
Output columns include packed size, depth, and base SHA for delta entries. A non-delta object has no base.
Tunables
git config pack.window 250 # candidates examined per object
git config pack.depth 50 # max chain length
git config pack.threads 4
git config pack.windowMemory 100m
git config pack.deltaCacheSize 256m
pack.window controls how aggressively Git searches for good bases; pack.depth limits how long delta chains can grow.
Aggressive vs normal repack
git repack -a -d # standard
git repack -a -d --depth=250 --window=250 # aggressive
git gc --aggressive
Aggressive repacks throw away existing deltas and recompute from scratch, costing CPU but possibly improving the result.
Reachability and bitmaps
Servers often combine delta packing with reachability bitmaps for fast clone-set computation:
git repack -adb
Why some objects are not deltified
- Cryptographically random data (already incompressible).
- Objects whose only candidates are smaller than configured threshold.
- Objects beyond the pack window's reach.
Cross-pack deltas
Multi-pack indexes (git multi-pack-index) plus the --geometric repack mode (Git 2.33+) make it possible to keep deltas across many packs efficiently:
git repack --geometric=2 -d
This is the recommended scheme for large repositories.
Delta islands
For server operators hosting many forks of the same project, delta islands partition objects so deltas only happen within a fork's set of refs, allowing fast clone of any single fork:
git config pack.island "refs/remotes/(.*)/heads"
git config pack.islandCore "main"
git repack -adb
Without islands, an object reachable only from fork A might be deltified against an object only reachable from fork B; serving fork A's clone would then require sending objects from B too. Most users never need this, but it is the secret sauce behind GitHub's, GitLab's, and Bitbucket's pack server performance.
Common mistakes
Assuming deltas are forward (newer = base) or backward (older = base); Git is agnostic and chooses whichever produces the smaller delta. Copying packs between repos with different SHA prefixes; the data is content-addressed so it is fine, but renaming files based on partial hashes can break. Setting absurdly high pack.depth in pursuit of small packs; deeper chains mean slower object access (each delta in the chain must be applied). Finally, expecting delta compression to save space on already-compressed binaries (videos, JPGs); it almost never does. Use Git LFS for those.