Sometimes you have to maintain a long-lived fork of an external Git repository. For me, this happens for example with various TextMate bundles where I want to keep my local modifications, but still want to get all the features and bug fixes that are being implemented upstream, too.

So you frequently merge upstream into your fork, and over time it gets harder and harder to keep track of the exact changes you made to the fork. Fortunately, there are some Git tricks to make things easier.

Listing Commits That Exist in the Fork Only

First of all, you can use git log upstream/master..master to limit the log output to commits that exist in the fork only. (Technically, Git interprets a revision range in the form of A..B as “all the commits that are reachable from B, but not from A”):

$ git log upstream/master..master
dc7741c (HEAD -> master) Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (5 months ago)
da81927 Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (9 months ago)
097c858 Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (10 months ago)
ff2ba96 Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (11 months ago)
1181de3 Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (1 year, 2 months ago)
20f285d Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (1 year, 3 months ago)
6040ebf Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (1 year, 9 months ago)
de1ac2b Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (1 year, 9 months ago)
983c9f1 Reset “encoding” snippet to upstream. <Stefan Daschek> (1 year, 11 months ago)
fbefd7f Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (1 year, 11 months ago)
45799ef Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (2 years, 1 month ago)
94f0ec8 Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (2 years, 2 months ago)
9a05d0a Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (2 years, 4 months ago)
85f15be Merge remote-tracking branch 'upstream/master' <Stefan Daschek> (2 years, 5 months ago)
85f831d Use ^⌘E for “Execute Line / Selection as Ruby”. <Stefan Daschek> (2 years, 5 months ago)
651b49c Make “Documentation for Word (APIDock)” visible again. <Stefan Daschek> (2 years, 5 months ago)
63b14af Use ⇧^H for “Documentation for Word”. <Stefan Daschek> (2 years, 5 months ago)
fc6ae6d Simplify 'encoding: utf-8' snippet, use 'enc' as trigger. <Stefan Daschek> (3 years, 8 months ago)
b246537 Simplify do...end snippet. <Stefan Daschek> (3 years, 8 months ago)
79f0b06 Fix invalid key binding that disables dead keys. <Stefan Daschek> (3 years, 8 months ago)
82fb660 Make Ctrl-" toggle only between single and double quotes. <Stefan Daschek> (4 years ago)
8352d50 Remove Ctrl-Shift-w keybinding (should call 'Wrap Selection' even in Ruby mode) <Stefan Daschek> (4 years, 1 month ago)
6ddab40 Make help command use APIDock. <Stefan Daschek> (4 years, 2 months ago)
1de19d1 Add rules for literal symbol syntax (%i and %I). <Stefan Daschek> (2 years, 8 months ago)

This is already much better, but still sprinkled with a lot of merge commits. Let’s try again with --no-merges:

$ git log upstream/master..master --no-merges
983c9f1 Reset “encoding” snippet to upstream. <Stefan Daschek> (1 year, 11 months ago)
85f831d Use ^⌘E for “Execute Line / Selection as Ruby”. <Stefan Daschek> (2 years, 5 months ago)
651b49c Make “Documentation for Word (APIDock)” visible again. <Stefan Daschek> (2 years, 5 months ago)
63b14af Use ⇧^H for “Documentation for Word”. <Stefan Daschek> (2 years, 5 months ago)
fc6ae6d Simplify 'encoding: utf-8' snippet, use 'enc' as trigger. <Stefan Daschek> (3 years, 8 months ago)
b246537 Simplify do...end snippet. <Stefan Daschek> (3 years, 8 months ago)
79f0b06 Fix invalid key binding that disables dead keys. <Stefan Daschek> (3 years, 8 months ago)
82fb660 Make Ctrl-" toggle only between single and double quotes. <Stefan Daschek> (4 years ago)
8352d50 Remove Ctrl-Shift-w keybinding (should call 'Wrap Selection' even in Ruby mode) <Stefan Daschek> (4 years, 1 month ago)
6ddab40 Make help command use APIDock. <Stefan Daschek> (4 years, 2 months ago)
1de19d1 Add rules for literal symbol syntax (%i and %I). <Stefan Daschek> (2 years, 8 months ago)

Excellent, now we have a succinct list of all the changes that have been made in the fork. Time for the next step.

Getting rid of Obsolete Changes

If your fork has existed for some time it is quite possible that some of its changes have become obsolete: Maybe in the meantime similar changes have been made upstream, or early changes were superseded by later changes in the fork. To keep track of the differences between fork and upstream it would be nice to somehow fix this up and retain only changes that are still relevant. Similar to an interactive rebase, but without destroying the history.

Turns out this is possible, too! The process consists of two parts:

  1. First, you do a special merge to make the fork identical to upstream. At this point it seems like all of your changes have been lost, but they still are part of the fork’s history.
  2. Now you cherry pick those changes (only) that are still relevant, reapplying them to the fork.

First about this “special merge”: Git supports supports different merge strategies, one of them being ours. Here’s how the manpage describes this strategy:

[…] the resulting tree of the merge is always that of the current branch head, effectively ignoring all changes from all other branches. It is meant to be used to supersede old development history of side branches.

One caveat: Because this merge strategy ignores “all changes from all other branches”, we can’t use it on our master branch directly (unfortunately there is no strategy names theirs). Instead, we will create a branch tracking upstream/master, merge our master branch into it, and then fast-forward merge this branch back into master:

1
2
3
4
5
6
7
8
9
# Create a branch from upstream/master
git checkout -b reset-to-upstream upstream/master

# Merge master into this branch, effectively ignoring all changes from master
git merge --strategy=ours master

# Switch back to master and merge the temporary branch (will be a fast-forward merge)
git checkout master
git merge reset-to-upstream

At this point, master and upstream/master are identical (git diff upstream/master should be empty). Now let’s reapply the commits that are still relevant:

1
2
3
4
5
6
7
# Show the list of commits that exist only in the fork only. Pipe output to
# cat to make sure output is stil available for copy-paste after the command
# has exited.
git log upstream/master..master --no-merges | cat

# Now just copy paste the commit hashes of the commits you want to keep:
git cherry-pick <hash> <hash> <hash> …

In my case I chose to reapply a single commit (36a1b92), so the history now looks like this:

$ git log --graph --date-order
* 36a1b92 (HEAD -> master, origin/master, origin/HEAD) Remove ⇧^H from other commands. <Stefan Daschek> (2 years, 5 months ago)
*   e4d7137 (reset-to-upstream) Merge branch 'master' into reset-to-upstream <Stefan Daschek> (2 days ago)
|\  
* | d5ed27f (upstream/master) Add snippet to insert `${}` in template strings <Michael Sheets> (5 days ago)
* | 91bb821 Change documentation tags to keyword.other.documentation <Michael Sheets> (6 days ago)
* | ce865b6 Do not allow documentation comments to start with `/***` <Michael Sheets> (6 days ago)
* | f3426a8 Move interpolation and escapes into local repository <Michael Sheets> (6 days ago)

Technically, you could now delete the reset-to-upstream branch. But it may be a good idea to keep it. Read on to see why.

Listing Commits Since the Last Cleanup

If you want to see all the commits in your fork since the last cleanup only, you need to specify one more additional option to git log:

$ git log upstream/master..master ^reset-to-upstream --no-merges
36a1b92 (HEAD -> master, origin/master, origin/HEAD) Remove ⇧^H from other commands. <Stefan Daschek> (2 years, 5 months ago)

Notice the ^reset-to-upstream option: This tells Git to also exclude all commits from the reset-to-upstream branch, leaving us with the commits we reapplied after the merge only.

Repeating the Process in the Future

Chances are you want to repeat the cleanup process at some point in the future. This turns out quite simple:

1
2
3
4
5
6
7
8
9
10
# Checkout the reset branch and bring it up to date with upstream
git checkout reset-to-upstream
git merge upstream/master

# Now just repeat the same process as described above
git merge --strategy=ours master
git checkout master
git merge reset-to-upstream
git log upstream/master..master ^reset-to-upstream --no-merges | cat
git cherry-pick <hash> <hash> <hash> …

Further Reading

To read about all the ways of specifying revisions or revision ranges for Git commands see gitrevisions. Also, the “Pro Git” book is freely available online and a great resource.


Think you need to know more about Git to be more productive? Let’s do a few hours of Git training! Ask us!