DevOps

I would like to explain why the git pull command is not to be used lightly and to question whether it is ever needed. The git pull command may look harmless, but it is used in ways that often leave a fair amount of mess. I will discuss safer alternatives. This article is for beginner to intermediate Git users looking to extend their skills in using pull requests and merge requests when collaborating on a project.

Alternatives to git pull

This section provides a condensed version of an approach for contributing to a software project without using git pull. I will go into more detail later.

Configure two remotes on your local repository so that you have origin pointing to your fork and upstream pointing to the repository you’re contributing to as follows:

→ git clone forked-repo-url
→ cd repo-name
→ git remote add upstream upstream-repo-url
→ git remote -v
origin	forked-repo-url (fetch)
origin	forked-repo-url (push)
upstream	upstream-repo-url (fetch)
upstream	upstream-repo-url (push)

Forget about your fork’s main branch. There is no reason to keep your main branch in sync with the upstream repository’s main branch. It’s a maintenance burden that serves no purpose. You should only maintain branches that are part of an ongoing PR work. The rest can be deleted.

Starting pull request work

Create your PR branch directly from the upstream repository’s branch it should be merged to (typically main):

→ git fetch upstream
→ git checkout -b my-pr-branch upstream/main
→ git add new-file some-other-file
→ git commit
→ git push origin HEAD

Syncing with ongoing work

Use rebase when you need to have your PR branch synchronized with changes on the target branch (address conflicts as needed):

→ git fetch upstream
→ git checkout my-pr-branch
→ git rebase upstream/main

Don’t add unnecessary commits

Don’t create new commits throughout the PR progression. Try limiting your PR to a single commit and add later changes by amending the original commit. Use force push to update the remote branch:

→ git add changed-file another-file
→ git commit --amend
→ git push --force origin HEAD

In other words, use git commit (without --amend) only for the first time you create the PR’s commit(s). Later on, only use git commit --amend.

Typical change workflow

Let’s assume this is a scenario in which developers contribute code to a repository. We’ll call it the upstream repository, to which they don’t necessarily have write access.

To contribute code, developers will do the following:

  1. Fork the upstream repository.
  2. Clone their forked repository.
  3. Create a feature branch out of the main branch of their forked repository (assuming its name is main, but it can be any other name).
  4. Introduce the code changes locally, commit, and push them to the newly-created feature branch on the forked repository.
  5. Create a pull request on GitHub (or a merge request on GitLab).

At this point, developers will ask their peers to review their code, address peers' comments, and push the changes to their forked repository in order to have their PR approved and merged.

But what happens when the upstream repository progresses? There are times when we need our feature branch to include the latest changes from the target branch (the upstream repository’s main branch). This might be because of conflicts between our work and the upstream repository, or maybe we have automated tests that have to run on an up-to-date feature branch to verify that we didn't introduce regressions.

When we need to get in sync with the upstream repository, a simpler scenario is where we want to create another PR to the same repository, but we don’t have the latest progress made on the upstream repository.

Using git pull is risky

How do I get all that upstream progress to my PR branch? At this point, one might say git pull must be the opposite of git push, so let’s use it to update my stuff with the upstream changes.

But git pull is the opposite of git push only in very specific cases. That is, when the local checked out branch can be fast-forwarded to the state of the branch being pulled.

When we want to push commits to an existing remote branch, git push will only go through if the remote branch did not diverge from our local branch. It only contains commits that exist on the local branch. If the remote branch contains commits not on the local branch, git push will fail.

→ git push origin HEAD 
To /tmp/tmp.B2Ljc86u9L
 ! [rejected]        HEAD -> foo (non-fast-forward)
error: failed to push some refs to '/tmp/tmp.B2Ljc86u9L'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

This is not the same for git pull. The git pull command performs git fetch and then git merge (this is configurable, but those are the typical defaults).

The git fetch command will update the remote-tracking branches (local branches mirroring remote branches), which is harmless.

The git merge command will merge the changes on the remote-tracking branch to the local branch.

This has some drawbacks:

  • If those changes cannot be fast-forwarded, it means a merge-commit will be created on the local branch.
    • If that local branch is our main branch, this is probably not what we want.
    • If that local branch is a PR branch, it means that our PR will now include a merge-commit, which is confusing for reviewers and makes our history look ugly.
  • Which remote branch is actually going to be merged into our local branch? We can control that, but we cannot assume git will necessarily be smart about picking the right one.

All in all, using git pull, puts us at risk of turning our PR branch (and the upstream branch if the changes are merged) into a merge-commits spaghetti or even merging changes from unexpected remote branches into our PR branch.

Take control

Git is powerful. Which means harmful actions can easily happen. We should be even more cautious with using git shortcuts embedded into our IDEs, graphical git utilities, and nice-looking buttons on GitHub that are supposed to solve our issues with a click.

At least some of those shortcuts can't read our minds yet. They will not do what we expect them to do, especially if we don’t know what we want them to do. They surely won’t clean up the mess they made.

Do we need so many branches?

When we navigate to our fork in GitHub, it usually warns us that our main branch is many commits behind the upstream branch (Figure 1). This makes us think that our main branch should be in sync with the upstream branch.

A screenshot of a GitHub warning for a branch that is many commits behind the upstream branch.
Figure 1: GitHub warns that our branch is many commits behind the target branch.

But does this branch really serve any purpose?

Let’s assume our fork exists only as a means to contribute code to an upstream repository rather than to develop a spin-off, which is usually the case.
I would argue that our fork’s main branch should not be a part of our contribution workflow. This is because a PR is a proposal for merging a feature branch to an upstream branch. Our fork’s main branch has nothing to do with that.

In other words, we start our PR branches out of the target upstream branch (e.g., the upstream main branch, not our fork’s main branch). In case we need to get our PR branch up-to-speed with current changes, it’s the target upstream branch we need to sync with, not our fork’s main branch.

Synchronizing our fork’s main branch with its upstream counterpart serves no purpose. We don’t need it, and performing pointless tasks is just another opportunity to introduce mistakes and mess up our environments.

Starting at the right point

So, how do we start a new PR branch from the tip of our upstream branch? Simple, we fetch our remote-tracking branches for the upstream repository. To do that, we first need to have a remote defined for the upstream repository.

Let’s assume we created our local repository by cloning our fork using the default options. Something like this:

→ git clone forked-repo-url

By default, we should now have a remote called origin pointing to the forked repo defined under the local repository:

→ cd repo-name
→ git remote -v
origin	forked-repo-url (fetch)
origin	forked-repo-url (push)

This will allow us to push changes to our fork.

In order to get the latest changes from the upstream repository, we need to add a remote for that repository:

→ git remote add upstream upstream-repo-url
→ git remote -v
origin	forked-repo-url (fetch)
origin	forked-repo-url (push)
upstream	upstream-repo-url (fetch)
upstream	upstream-repo-url (push)

We now have the remote defined. To synchronize the remote-tracking branches, we need to fetch:

→ git fetch upstream

With that, we create a remote-tracking branch called upstream/main (assuming that’s the branch for which we want to create a PR). This is a local branch containing the content of the main branch on the upstream repository at the time we last fetched upstream.

We can list our remote-tracking branches:

→ git branch --remote
  origin/HEAD -> origin/main
  upstream/main

To create our feature branch at the current state of upstream/main, we use:

→ git checkout -b my-pr-branch upstream/main

This command creates a new branch called my-pr-branchat the commit upstream/main points to and switches to the newly-created branch.

About PR progression

We should now make our changes, git add them, commit, and push them to origin:

→ git add changed-file another-file
→ git commit
→ git push origin HEAD

This will push our changes to origin (the remote pointing to our fork) into a branch with the same name as our local branch (the HEAD keyword points to the latest commit on the currently checked-out branch).

We will now use the GitHub UI to create a PR. We will make sure that the source branch of the PR is the newly-created feature branch on our fork, while the target branch is the branch to which we want to propose changes (in this case, the main branch of the upstream repository).

Our diligent peers will thoroughly review our work and point us to some issues requiring our attention.

We will then fix those issues locally and git add them, but we will not include them on a new commit. We will instead use them to amend the original commit and then force-push them to the same branch:

→ git add changed-file another-file some-other-file
→ git commit --amend
→ git push --force origin HEAD

The reason for amending the commit rather than creating a new one is so that our PR’s commits will ultimately represent the progression of the code we propose to introduce to the upstream branch rather than representing the progression of the PR work.

In other words, if our PR intends to fix the fairy dust dispenser, then we want it to include a commit with title “fix the fairy dust dispenser” which will contain all changes required for fixing the fairy dust dispenser, rather than 3 commits titled “fix the fairy dust dispenser”, “removing prints”, “addressing comments”.

By amending the commit, we’re diverging from the remote branch. We created a new commit instead of the one we already pushed. So Git will not allow us to push to it. The remote branch contains our original commit, which our local branch doesn't have anymore.

At this point, it will even suggest that we use git pull to fix it (see the previous rejection message). Don’t use git pull for that. It will not fix our issue and will create others instead. Don’t use git pull at all.

We now know a few things about Git, and we have some confidence in what we’re doing. So we force-push instead, telling Git that we want to replace the content of the remote branch with the content of the local branch. That’s the reason for using push --force.

Fetch and rebase to the rescue

It might also be that while we were busy waiting for reviews, some other changes merged to the target branch on the upstream repository. In this case, we might need to get in sync with that target branch.

Will it be git pull to the rescue? No. By default, git pull will create a merge commit on our feature branch, merging the work done on the target branch since we started working on it. This will make reviewers’ lives harder and, if merged, will not look nice on the target branch history (e.g., how would it look if we try to revert this PR for some reason at a later stage?).

Or will it be git fetch and git rebase to the rescue? Indeed!

To overcome this, we must replay our changes on top of the latest changes on the target branch. To do that, we need to synchronize our remote-tracking branch with the upstream repository and rebase our PR branch on top of it:

→ git fetch upstream
→ git checkout my-pr-branch
→ git rebase upstream/main
Successfully rebased and updated refs/heads/my-pr-branch.

The git rebase command will find the commit that is the common ancestor of our PR branch and the target branch. That should be the commit from which we started our work. It will take all the commits on our PR branch introduced after that point and replay them on top of the target branch.

Suppose that we started our work when the target branch's Git history (git log) looked something like this (newest commits first):

commit happened just before we started working on our PR
slightly older commit
even older commit

On our PR branch, we created a commit for the content we want to deliver, so its history looks something like this:

our pr commit
commit happened just before we started working on our PR
slightly older commit
even older commit

Our busy colleagues did some work in the meantime, and the target branch now looks like this:

yet more work done while our pr was in review
some work done while our pr was in review
commit happened just before we started working on our PR
slightly older commit
even older commit

If we fetch the upstream repo, we now have the remote-tracking branch upstream/main containing this history.

If we checkout our PR branch and rebase on top of upstream/main, Git will:

  1. Find the newest commit existing on both branches, which is the one named "commit happened just before…".
  2. Take all commits on our branch that happened after that point. In this case, it’s the one named "our pr commit".
  3. Reset our PR branch to the current state of the remote-tracking branch.
  4. Replay "our pr commit" on top of that.

The result:

our pr commit
yet more work done while our pr was on review
some work done while our pr was on review
commit happened just before we started working on our PR
slightly older commit
even older commit

Replaying our changes on top of the latest changes means that it will create new commit(s) with the same changes from our original commit(s). Namely, they will look the same in terms of the changes they made, but they will have different commit hash because the starting point for the changes is different.

Rebase looks simple enough

Rebasing replays the original changes we made on top of the current state of the branch on which we rebase upon. What happens if the starting point of the content we changed is not the same anymore? In other words, what happens if the lines that we changed in the PR's commit(s) were also changed by the commits that were added to the target branch in the meantime?

The answer is: conflicts.

In the following example, we’re trying to rebase branch foobar with a commit message "add bar" on top of the upstream/main branch having at its tip a commit called "add baz", which does not exist on the PR branch. The two commits are changing the same line on a file called foo.

In this case, instead of telling us that our branch was successfully rebased, Git will let us know which files it failed to process:

Auto-merging foo
CONFLICT (content): Merge conflict in foo
error: could not apply 7a384ae... add bar
Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 7a384ae... add bar

It will also add markers within the failed files denoting the conflict(s):

→ cat foo
<<<<<<< HEAD
baz
=======
bar
>>>>>>> 7a384ae (add bar)

It could be a bit confusing, but everything between the <<< markers and the === markers is the content coming from the branch on top of which we’re trying to rebase (the target branch), while the stuff between the === markers and the >>> markers is the content coming from the branch we try to place at the top (the changes coming from our PR branch).

What we now need to do is to decide what should be the correct content for each conflict and then delete all markers. In our case, we’re going to delete the markers and add the conflicting statements under the same line:

→ cat foo
baz bar

If we try to rebase multiple commits, there may be conflicts on each and every commit, and we’d need to decide what should have been the content of each conflicting line for each such conflict at the point in which each commit is applied.

Sounds complicated. Which is yet another reason to make small PRs including only a single commit (easier code review is another reason).

Once we resolve all conflicts, we need to git add all files that had conflicts, and run git rebase --continue. A common mistake is committing the changes instead.

Do not use git commit during a rebase process, as we’re not trying to add a new commit, just to fix conflicts on existing commits:

→ git add foo
→ git rebase --continue
...
Successfully rebased and updated refs/heads/foobar.

Now you can force-push the changes and later repeat the same steps if you need to attend to more issues until your code is ready to be merged.

Tips to track your progress

It is easy to lose track of your current step, so follow these tips.

  • Use git status to see your staged and unstaged changes and the branch you have checked out.
  • Use git log to convince yourself that your Git history makes sense. Each commit line includes all branches pointing to this commit. After rebasing on top of a branch, you should see your commit(s) at the top, and the target branch just underneath. If that is not the case, then we need to figure out what we did wrong.
    In this example, we’re on branch foobar, and we rebased the “add bar” commit on top of upstream/main:
    af2e399 (HEAD -> foobar) add bar
    5ba3bf7 (upstream/main) add baz
    3b3b1bb add foo
  • At any step before the rebase process is done, you can abort it and revert to the stage before the process started with git rebase --abort.
  • If you realize you made a mistake, there are still a few ways to go back. One such option is to override your local branch with the last version you pushed to the remote (losing all local progress):
    → git fetch origin
    → git checkout my-pr-branch
    → git reset --hard origin/my-pr-branch
  • Use git show before you push your changes to the remote to show the content of your last commit in order to convince yourself it makes sense. Until you push your changes, you still have the remote branch as backup.

Summary

Contrary to this somewhat opinionated text, I honestly believe that everyone should do what works for them. In my opinion, using git pull is problematic, and I have explained why and offered alternatives. What is important to keep in mind is that while Git is powerful, it is not forgiving. You can use git reflog to undo many mistakes, but it’s not easy to use. For this reason, when we do something, we need to know the expected result, such as which branch or remote will be affected and what is changing. Hopefully, you will also know how the chosen steps will take you there.