How Git Actually Works: A Deep Dive Under the Hood
Demystify Git's internals — from commits and branches to merges and reflogs. Learn how Git really stores your code and why understanding it makes you unstoppable.
How Git Actually Works: A Deep Dive Under the Hood
Most developers use Git every day. Few truly understand it.
We memorize commands, copy-paste fixes from Stack Overflow, and hope nothing breaks. When Git does break, it feels hostile — cryptic errors, missing commits, “detached HEAD”, diverged branches.
Here’s the truth:
Git only feels dangerous when you don’t understand its mental model.
Once you know how Git stores data internally, it becomes predictable, debuggable, and surprisingly safe.
Let’s open the hood.
Part 1: The Commit — Git’s Fundamental Unit
Everything in Git revolves around commits. Branches, tags, merges — all of them are just references to commits.
Commits Are Immutable
Once a commit is created, it never changes:
- File contents are frozen
- The diff is frozen
- Parent relationships are frozen
- Author, timestamp, and message are frozen
This immutability is not a limitation — it’s Git’s strongest guarantee.
Commit IDs (SHA-1 Hashes)
Every commit is identified by a SHA-1 hash, derived from its contents:
a2b3c4d5e6f7890123456789abcdef1234567890Change anything — a file, the message, even whitespace — and the hash changes completely.
That’s why:
- Git can detect corruption
- Git can trust history
- Git never silently overwrites data
Short hashes (e.g.
a2b3c4d) are just abbreviations — the full hash is what matters.
What’s Inside a Commit?
A commit is not a diff. It’s a snapshot reference plus metadata:
git cat-file -p <commit>Contains:
- tree → snapshot of the directory structure
- parent(s) → previous commit(s)
- author / committer
- commit message
The tree points to directories, which point to files (blobs). Git builds everything from this structure.
Is Git Duplicating Files on Every Commit?
No.
Git uses content-addressable storage:
- Files are stored by hash
- Identical content → stored once
- Changed content → new object
This is why Git is fast and space-efficient.
Part 2: Branches — Just Pointers
A branch is far simpler than most people expect.
A branch is:
- A name
- A pointer to a commit
- A reflog tracking pointer movement
That’s it.
cat .git/refs/heads/mainOutputs a commit hash. Nothing more.
Three Useful Ways to Think About Branches
1. Divergence view Shows where work split:
A---B---C main
\
D---E feature
2. History view
What git log branch shows — all ancestors.
3. Pointer view (the truth) A branch is just a label pointing at the latest commit.
Git itself only cares about #3.
Git does not know that one branch “came from” another. You tell Git relationships via merge or rebase.
main Is Not Special
Git does not treat main differently from any other branch.
Rules like “never commit to main” are human conventions, not Git rules.
Part 3: HEAD and Detached HEAD
What Is HEAD?
HEAD answers one question:
“Where am I right now?”
Normally:
HEAD → main → commit
Detached HEAD Explained
If you check out a commit directly:
git checkout <commit>Then:
HEAD → commit
(main still points elsewhere)
This is detached HEAD state.
It’s not an error. It’s temporary time travel.
The danger comes from committing without anchoring that work to a branch.
How to Recover
# Create a branch to save your work
git checkout -b new-branchOr simply return to an existing branch.
Part 4: Inside the .git Directory
Everything Git knows lives here:
.git/
├── HEAD
├── config
├── objects/
├── refs/
├── logs/
└── indexKey ideas:
objects/stores everything (commits, trees, blobs)refs/stores branch and tag pointerslogs/stores reflogsindexis the staging area
Git is essentially a content-addressable filesystem with references on top.
Part 5: The Staging Area (Index)
Git commits happen in two steps by design.
Working Directory → Staging Area → Commit
Why? Because staging lets you:
- Craft clean commits
- Commit partial changes
- Separate experimentation from intent
Terminology Chaos (Yes, It’s Real)
These all mean the same thing:
- staged
- cached
- index
It’s confusing — but knowing this removes a lot of friction.
Diff Gotchas
git diff # unstaged only
git diff --cached # staged only
git diff HEAD # everythingPart 6: Merging — Combining History
Merging is not about files — it’s about reconciling timelines.
Git:
- Finds the common ancestor
- Computes both sides’ changes
- Combines them
- Creates a merge commit (if needed)
Fast-Forward vs Merge Commit
- Fast-forward → pointer moves forward
- Merge commit → histories diverged
Both produce the same code. The difference is history shape, not output.
Merge Conflicts
Conflicts occur when:
- Both branches modify the same lines
Resolution is always manual — tools just make it easier.
Part 7: Remotes and Tracking Branches
A remote is just another Git repository with a URL.
For every remote branch, Git keeps a local cache:
origin/main
This cache updates only when you run:
git fetchgit pullgit push
“Up to date” means up to date with the cache, not the server.
Fetch vs Pull
git fetch # updates cache only
git pull # fetch + merge/rebaseDiverged Branches
When both local and remote have unique commits, you must choose:
- rebase
- merge
- discard local
- force push (dangerous)
Git forces you to be explicit — this is a feature.
Part 8: git reset — Powerful and Dangerous
git reset moves branch pointers.
Modes:
--soft→ move pointer only--mixed→ unstage changes--hard→ delete uncommitted changes
There is no undo button. There is responsibility.
Part 9: Reflog — Your Last Line of Defense
The reflog records every position HEAD and branches have ever pointed to.
git reflogIf something feels “lost”, it usually isn’t — it’s just unreferenced.
Recover by:
- finding the commit
- resetting or branching to it
Safety Levels in Git
| Data | Safety |
|---|---|
| Commits on branches | Very safe |
| Orphaned commits | Temporarily safe |
| Branch pointers | Mutable |
| Staging area | Fragile |
| Stash | Fragile |
Key Takeaways
- Commits are immutable snapshots
- Branches are pointers, not containers
- HEAD is your current position
- Merges combine history, not files
- Remotes are cached locally
- Reset moves pointers
- Reflog saves you from disaster