Introduction
Git stores objects in two physical formats. Loose objects are individual files; packed objects live inside a pack file with a delta-compressed body. Both forms describe the same logical objects, addressed by the same hashes.
Loose layout
Each loose object is at .git/objects/xx/yyy..., where xx is the first two hex chars of the SHA and yyy... is the remaining 38. The file is the zlib-deflated bytes of <type> <size>\0<content>:
find .git/objects -type f -name '??' -prune -o -type f -print | head
git cat-file -p <sha>
Pack layout
.git/objects/pack/pack-<hash>.pack
.git/objects/pack/pack-<hash>.idx
The .idx file maps SHA to byte offset within the .pack; without it, the pack would require a linear scan.
How objects move between forms
New objects are written loose. They become packed when:
git gcruns (manual or automatic).git repackis invoked.- A fetch or push transfers them as a pack.
git fast-importwrites them directly into a pack.
git gc
git repack -a -d
git count-objects -v
count-objects -v reports loose count, total disk usage, and pack stats.
Trade-offs
- Loose: simple, easy to inspect, fast to write a single object. Slow when there are millions.
- Packed: compact, delta-compressed, fast to bulk-read. Requires repacking to incorporate new objects.
Auto-gc
Git triggers an auto-gc when too many loose objects accumulate. Tunables:
git config gc.auto 6700 # threshold for loose objects
git config gc.autoPackLimit 50 # threshold for number of packs
git config gc.auto 0 # disable auto-gc (not recommended)
Inspecting a pack
git verify-pack -v .git/objects/pack/pack-<hash>.idx | head
git verify-pack -s .git/objects/pack/pack-<hash>.idx
You will see object types, packed sizes, deltas, and chain depths.
Recovering from corruption
git fsck --full
cp -r broken-repo backup
git unpack-objects < .git/objects/pack/pack-<hash>.pack
unpack-objects explodes a pack back to loose form, useful when a single bad object inside a pack is taking down many operations.
Looking up an object
When Git resolves a SHA, it searches in this order: loose object file, then each pack's idx, then alternates (objects/info/alternates). The lookup is essentially constant-time per pack thanks to the idx's fanout table. The performance impact of having many small packs vs one big pack is real, which is why gc consolidates:
git count-objects -v
cat .git/objects/info/alternates 2>/dev/null
git verify-pack -s .git/objects/pack/pack-<hash>.idx
Alternates let multiple repos share a pool of objects on disk; git clone --shared creates such a setup and saves space at the cost of some safety.
Common mistakes
Manually deleting pack- files to reclaim space; that destroys reachable history. Use git gc. Disabling auto-gc on long-lived repos and ending up with millions of loose objects, slowing every operation. Running git gc --aggressive regularly when normal gc would do; aggressive packing is expensive. Finally, copying a repo with cp -r while a pack is being written, ending up with a corrupt index. Use git clone --local for safe copies.