Skip to main content

How Git Actually Works: A Deep Dive Under the Hood

Demystify Git's internals — from commits and branches to merges and reflogs. Learn how Git really stores your code and why understanding it makes you unstoppable.

[ 22 Dec 2025 5 min read
views
]
Illustration of Git internals showing commits, branches, and the .git folder

How Git Actually Works: A Deep Dive Under the Hood

Most developers use Git every day. Few truly understand it.

We memorize commands, copy-paste fixes from Stack Overflow, and hope nothing breaks. When Git does break, it feels hostile — cryptic errors, missing commits, “detached HEAD”, diverged branches.

Here’s the truth:

Git only feels dangerous when you don’t understand its mental model.

Once you know how Git stores data internally, it becomes predictable, debuggable, and surprisingly safe.

Let’s open the hood.


Part 1: The Commit — Git’s Fundamental Unit

Everything in Git revolves around commits. Branches, tags, merges — all of them are just references to commits.

Commits Are Immutable

Once a commit is created, it never changes:

  • File contents are frozen
  • The diff is frozen
  • Parent relationships are frozen
  • Author, timestamp, and message are frozen

This immutability is not a limitation — it’s Git’s strongest guarantee.


Commit IDs (SHA-1 Hashes)

Every commit is identified by a SHA-1 hash, derived from its contents:

a2b3c4d5e6f7890123456789abcdef1234567890

Change anything — a file, the message, even whitespace — and the hash changes completely.

That’s why:

  • Git can detect corruption
  • Git can trust history
  • Git never silently overwrites data

Short hashes (e.g. a2b3c4d) are just abbreviations — the full hash is what matters.


What’s Inside a Commit?

A commit is not a diff. It’s a snapshot reference plus metadata:

git cat-file -p <commit>

Contains:

  1. tree → snapshot of the directory structure
  2. parent(s) → previous commit(s)
  3. author / committer
  4. commit message

The tree points to directories, which point to files (blobs). Git builds everything from this structure.


Is Git Duplicating Files on Every Commit?

No.

Git uses content-addressable storage:

  • Files are stored by hash
  • Identical content → stored once
  • Changed content → new object

This is why Git is fast and space-efficient.


Part 2: Branches — Just Pointers

A branch is far simpler than most people expect.

A branch is:

  1. A name
  2. A pointer to a commit
  3. A reflog tracking pointer movement

That’s it.

cat .git/refs/heads/main

Outputs a commit hash. Nothing more.


Three Useful Ways to Think About Branches

1. Divergence view Shows where work split:

A---B---C main
     \
      D---E feature

2. History view What git log branch shows — all ancestors.

3. Pointer view (the truth) A branch is just a label pointing at the latest commit.

Git itself only cares about #3.

Git does not know that one branch “came from” another. You tell Git relationships via merge or rebase.


main Is Not Special

Git does not treat main differently from any other branch.

Rules like “never commit to main” are human conventions, not Git rules.


Part 3: HEAD and Detached HEAD

What Is HEAD?

HEAD answers one question:

“Where am I right now?”

Normally:

HEAD → main → commit

Detached HEAD Explained

If you check out a commit directly:

git checkout <commit>

Then:

HEAD → commit
(main still points elsewhere)

This is detached HEAD state.

It’s not an error. It’s temporary time travel.

The danger comes from committing without anchoring that work to a branch.


How to Recover

# Create a branch to save your work
git checkout -b new-branch

Or simply return to an existing branch.


Part 4: Inside the .git Directory

Everything Git knows lives here:

.git/
├── HEAD
├── config
├── objects/
├── refs/
├── logs/
└── index

Key ideas:

  • objects/ stores everything (commits, trees, blobs)
  • refs/ stores branch and tag pointers
  • logs/ stores reflogs
  • index is the staging area

Git is essentially a content-addressable filesystem with references on top.


Part 5: The Staging Area (Index)

Git commits happen in two steps by design.

Working Directory → Staging Area → Commit

Why? Because staging lets you:

  • Craft clean commits
  • Commit partial changes
  • Separate experimentation from intent

Terminology Chaos (Yes, It’s Real)

These all mean the same thing:

  • staged
  • cached
  • index

It’s confusing — but knowing this removes a lot of friction.


Diff Gotchas

git diff           # unstaged only
git diff --cached  # staged only
git diff HEAD      # everything

Part 6: Merging — Combining History

Merging is not about files — it’s about reconciling timelines.

Git:

  1. Finds the common ancestor
  2. Computes both sides’ changes
  3. Combines them
  4. Creates a merge commit (if needed)

Fast-Forward vs Merge Commit

  • Fast-forward → pointer moves forward
  • Merge commit → histories diverged

Both produce the same code. The difference is history shape, not output.


Merge Conflicts

Conflicts occur when:

  • Both branches modify the same lines

Resolution is always manual — tools just make it easier.


Part 7: Remotes and Tracking Branches

A remote is just another Git repository with a URL.

For every remote branch, Git keeps a local cache:

origin/main

This cache updates only when you run:

  • git fetch
  • git pull
  • git push

“Up to date” means up to date with the cache, not the server.


Fetch vs Pull

git fetch   # updates cache only
git pull    # fetch + merge/rebase

Diverged Branches

When both local and remote have unique commits, you must choose:

  • rebase
  • merge
  • discard local
  • force push (dangerous)

Git forces you to be explicit — this is a feature.


Part 8: git reset — Powerful and Dangerous

git reset moves branch pointers.

Modes:

  • --soft → move pointer only
  • --mixed → unstage changes
  • --hard → delete uncommitted changes

There is no undo button. There is responsibility.


Part 9: Reflog — Your Last Line of Defense

The reflog records every position HEAD and branches have ever pointed to.

git reflog

If something feels “lost”, it usually isn’t — it’s just unreferenced.

Recover by:

  • finding the commit
  • resetting or branching to it

Safety Levels in Git

DataSafety
Commits on branchesVery safe
Orphaned commitsTemporarily safe
Branch pointersMutable
Staging areaFragile
StashFragile

Key Takeaways

  • Commits are immutable snapshots
  • Branches are pointers, not containers
  • HEAD is your current position
  • Merges combine history, not files
  • Remotes are cached locally
  • Reset moves pointers
  • Reflog saves you from disaster
> Comments