How Git Actually Works: A Deep Dive Under the Hood

Demystify Git's internals — from commits and branches to merges and reflogs. Learn how Git really stores your code and why understanding it makes you unstoppable.

[ 22 Dec 2025 • 5 min read •

views

]

How Git Actually Works: A Deep Dive Under the Hood

Most developers use Git every day. Few truly understand it.

We memorize commands, copy-paste fixes from Stack Overflow, and hope nothing breaks. When Git does break, it feels hostile — cryptic errors, missing commits, “detached HEAD”, diverged branches.

Here’s the truth:

Git only feels dangerous when you don’t understand its mental model.

Once you know how Git stores data internally, it becomes predictable, debuggable, and surprisingly safe.

Let’s open the hood.

Part 1: The Commit — Git’s Fundamental Unit

Everything in Git revolves around commits. Branches, tags, merges — all of them are just references to commits.

Commits Are Immutable

Once a commit is created, it never changes:

File contents are frozen
The diff is frozen
Parent relationships are frozen
Author, timestamp, and message are frozen

This immutability is not a limitation — it’s Git’s strongest guarantee.

Commit IDs (SHA-1 Hashes)

Every commit is identified by a SHA-1 hash, derived from its contents:

a2b3c4d5e6f7890123456789abcdef1234567890

Change anything — a file, the message, even whitespace — and the hash changes completely.

That’s why:

Git can detect corruption
Git can trust history
Git never silently overwrites data

Short hashes (e.g. a2b3c4d) are just abbreviations — the full hash is what matters.

What’s Inside a Commit?

A commit is not a diff. It’s a snapshot reference plus metadata:

git cat-file -p <commit>

Contains:

tree → snapshot of the directory structure
parent(s) → previous commit(s)
author / committer
commit message

The tree points to directories, which point to files (blobs). Git builds everything from this structure.

Is Git Duplicating Files on Every Commit?

No.

Git uses content-addressable storage:

Files are stored by hash
Identical content → stored once
Changed content → new object

This is why Git is fast and space-efficient.

Part 2: Branches — Just Pointers

A branch is far simpler than most people expect.

A branch is:

A name
A pointer to a commit
A reflog tracking pointer movement

That’s it.

cat .git/refs/heads/main

Outputs a commit hash. Nothing more.

Three Useful Ways to Think About Branches

1. Divergence view Shows where work split:

A---B---C main
     \
      D---E feature

2. History view What git log branch shows — all ancestors.

3. Pointer view (the truth) A branch is just a label pointing at the latest commit.

Git itself only cares about #3.

Git does not know that one branch “came from” another. You tell Git relationships via merge or rebase.

`main` Is Not Special

Git does not treat main differently from any other branch.

Rules like “never commit to main” are human conventions, not Git rules.

Part 3: HEAD and Detached HEAD

What Is HEAD?

HEAD answers one question:

“Where am I right now?”

Normally:

HEAD → main → commit

Detached HEAD Explained

If you check out a commit directly:

git checkout <commit>

Then:

HEAD → commit
(main still points elsewhere)

This is detached HEAD state.

It’s not an error. It’s temporary time travel.

The danger comes from committing without anchoring that work to a branch.

How to Recover

# Create a branch to save your work
git checkout -b new-branch

Or simply return to an existing branch.

Part 4: Inside the `.git` Directory

Everything Git knows lives here:

.git/
├── HEAD
├── config
├── objects/
├── refs/
├── logs/
└── index

Key ideas:

objects/ stores everything (commits, trees, blobs)
refs/ stores branch and tag pointers
logs/ stores reflogs
index is the staging area

Git is essentially a content-addressable filesystem with references on top.

Part 5: The Staging Area (Index)

Git commits happen in two steps by design.

Working Directory → Staging Area → Commit

Why? Because staging lets you:

Craft clean commits
Commit partial changes
Separate experimentation from intent

Terminology Chaos (Yes, It’s Real)

These all mean the same thing:

staged
cached
index

It’s confusing — but knowing this removes a lot of friction.

Diff Gotchas

git diff           # unstaged only
git diff --cached  # staged only
git diff HEAD      # everything

Part 6: Merging — Combining History

Merging is not about files — it’s about reconciling timelines.

Git:

Finds the common ancestor
Computes both sides’ changes
Combines them
Creates a merge commit (if needed)

Fast-Forward vs Merge Commit

Fast-forward → pointer moves forward
Merge commit → histories diverged

Both produce the same code. The difference is history shape, not output.

Merge Conflicts

Conflicts occur when:

Both branches modify the same lines

Resolution is always manual — tools just make it easier.

Part 7: Remotes and Tracking Branches

A remote is just another Git repository with a URL.

For every remote branch, Git keeps a local cache:

origin/main

This cache updates only when you run:

git fetch
git pull
git push

“Up to date” means up to date with the cache, not the server.

Fetch vs Pull

git fetch   # updates cache only
git pull    # fetch + merge/rebase

Diverged Branches

When both local and remote have unique commits, you must choose:

rebase
merge
discard local
force push (dangerous)

Git forces you to be explicit — this is a feature.

Part 8: `git reset` — Powerful and Dangerous

git reset moves branch pointers.

Modes:

--soft → move pointer only
--mixed → unstage changes
--hard → delete uncommitted changes

There is no undo button. There is responsibility.

Part 9: Reflog — Your Last Line of Defense

The reflog records every position HEAD and branches have ever pointed to.

git reflog

If something feels “lost”, it usually isn’t — it’s just unreferenced.

Recover by:

finding the commit
resetting or branching to it

Safety Levels in Git

Data	Safety
Commits on branches	Very safe
Orphaned commits	Temporarily safe
Branch pointers	Mutable
Staging area	Fragile
Stash	Fragile

Key Takeaways

Commits are immutable snapshots
Branches are pointers, not containers
HEAD is your current position
Merges combine history, not files
Remotes are cached locally
Reset moves pointers
Reflog saves you from disaster

> Comments

How Git Actually Works: A Deep Dive Under the Hood

Part 1: The Commit — Git’s Fundamental Unit

Commits Are Immutable

Commit IDs (SHA-1 Hashes)

What’s Inside a Commit?

Is Git Duplicating Files on Every Commit?

Part 2: Branches — Just Pointers

Three Useful Ways to Think About Branches

main Is Not Special

Part 3: HEAD and Detached HEAD

What Is HEAD?

Detached HEAD Explained

How to Recover

Part 4: Inside the .git Directory

Part 5: The Staging Area (Index)

Terminology Chaos (Yes, It’s Real)

Diff Gotchas

Part 6: Merging — Combining History

Fast-Forward vs Merge Commit

Merge Conflicts

Part 7: Remotes and Tracking Branches

Fetch vs Pull

Diverged Branches

Part 8: git reset — Powerful and Dangerous

Part 9: Reflog — Your Last Line of Defense

Safety Levels in Git

Key Takeaways

`main` Is Not Special

Part 4: Inside the `.git` Directory

Part 8: `git reset` — Powerful and Dangerous