Notes on using Git

Git resources
Git concepts
Git version
The Git Repository
Getting Started
Branches and Rebasing
Names
Backing Out Commits
Examining History
Other Git Commands

frysk has changed from CVS to Git for its source control system. This page is meant to ease the transition for new Git users and point the way towards the more advanced usage that motivated the switch in the first place.

Git Resources

The Git manual pages can be a bit opaque, but there are several good sources of information about Git on the Web.

A summary of useful commands and workflows, from basic to advanced.
The Wine project has a useful Wiki page about Git, describing many common operations.
The Git user's manual is more of an exotic cookbook than a proper user's manual, but it is useful once you've learned the basics elsewhere.

Git Concepts for CVS Refugees

You can use Git in a similar way to CVS and ignore many of its features, but if you know a little background information about Git and its repository structure you will be able to start doing more advanced operations and get out of trouble if you mess something up. This chapter in the Git manual tells the whole story, but a quick introduction will get you started.

The Git repository stores the history of a project as a chain of revisions which describe changes (file and directory additions, deletions, and modifications) to the project's source tree. These revisions, called commit objects, are named by an SHA1 hash over the contents of the commit, the date and author, and the SHA1 hashes of the commit's parents. It is important to note that two commits, whose contents (diff) are identical and that were based on identical "snapshots" of the source tree, will have different SHA1 names if the history that preceeded them is different.

When you start work on a project that is in a remote git repository, you clone that repository; that is, you copy all the commit objects to your machine. The "checked out" copy of the sources where you do your work is called the working tree. As you work on the project you create new commit objects with your changes. You update the remote repository by publishing your new commit objects to it. One way to publish is by emailing your new commit objects to a centeral maintainer (e.g., Linus) who either commits them to the remote repository, rejects them, or hacks on them locally and then commits new ones. In the frysk project we allow developers to have write access to the remote repository and publish the commits themselves.

The chain of commits is actually a directed acyclic graph, supporting branches of development. A commit object can have multiple children at a branch point; similarly, a commit can have multiple parents where branches merge back together. In Git a branch name is not much more than a nickname for the commit object on the end of a chain of commit objects.

A Git repository contains other auxiliary data structures to speed up common operations. The most important one is the index, which supports very fast diff operations over the entire tree. When you create a new commit, you first have to update the index with the content of the new commit. In CVS you have to cvs add new files, but not new changes; in Git you have to git-add every change, including file modifications. In practice you don't have to deal with it that much because the git-commit command has a flag to automatically add changed files to the index before a commit, though you do need to explicitly add new files. Also, when the Git commands that do merges fail because of a conflict, they put conflict markers in the offending files and exit. After editing the files and resolving the conflict, you signal that the conflict is resolved by adding the changes to the index. You can then restart the merge command and continue.

Git Version

Version 1.5.2.2 of Git is currently available in yum for Fedora Core 6, but it is recommended that you download 1.5.3.2 or later from kernel.org.

The frysk Git Repository

You can see a view of the frysk Git repository here. Like most Git repositories, It contains a branch called master which is the official "mainline" of development. It also has a branch called cvs-sync which is updated semi-automatically from CVS checkins. When we switch over from CVS to Git for real the cvs-sync branch will stay frozen at the last commit derived from CVS. Developers are encouraged to create their own branches for learning Git and for publishing long-lived lines of development that aren't ready for master.

Getting Started

Let's start hacking. Clone the repository with

    git-clone ssh://sources.redhat.com/git/frysk.git

If you're not a frysk developer, you can use the URL git://sources.redhat.com/git/frysk.git instead.

In the frysk subdirectory you will find a checked-out frysk tree. The Git repository is in the hidden directory .git. What state are we in?

$ git-status
# On branch master
nothing to commit (working directory clean)

The gitk --all command is useful for getting a graphic representation of the branch and history structure of the repository. Using it we see that our master branch is branched from remotes/origin/master, which is a remote branch. We have a copy all the commit objects from that remote branch in our repository and we get fetch updates to it, but we can never switch to a remote branch and make actual changes there.

Make some changes and commit them locally:

$ edit newfile.java (1)
$ edit ChangeLog (2)
$ git-add newfile.java (3)
$ git-commit -a (4>
$ git-pull origin (5)
$ git-push origin (6)

create a new file
edit an existing file
add the new file to the index
commit the changes. The -a option to git-commit causes it to add the changes to ChangeLog to the index.
update our repository with any changes from the remote repository.
push our changes to the remote repository.

In step 5 we could have gotten a merge conflict if we had touched files that had been updated in the remote repository. In that case we would look for the conflict markers (like in CVS), edit them, add our changes with git-add, and make a commit with git-commit.Git will fashion a special merge commit for us.

In step 6 the push would have failed if our copy of the master branch wasn't up to date with the remote branch. In that case we would have done a git-pull, resolved any merge conflicts, and tried again.

Branches and Rebasing

In the above example we made changes directly to the master branch, but it's generally desirable to create a local branch for all but the most trivial changes. In this way you can work on several bug fixes and other development at once. Also, in the above example, if there had been a conflict at the pull step, our merge commit would have been pushed to the remote repository where it would be an ugly distraction. By doing work in a local branch and then rebasing that onto the current upstream state we can commit a very clean series of commits to the upstream repository.

First, let's make our change on the master branch and rebase that onto the remote branch:

$ edit frysk-core/frysk/hpd/BreakpointCommand.java 
$ git-commit -a (1)
$ git-fetch origin (2)
$ git-rebase origin (3)
$ git-add frysk-core/frysk/hpd/BreakpointCommand.java (4)
$ git-rebase --continue
$ git-push origin

commit changes to master branch
fetch updates from the remote repository without updating local branches
Rewind this branch to the branch point, merge in updates from the remote repository, and then replay the new commits on top of that
If any conflicts are encountered during rebasing, edit them, indicate that they are resolved with git-add, and continue the rebase operation with git-rebase --continue.

Here is the same work, done on a branch:

$ git-branch topic/fixbreakpoints master(1)
$ git-checkout topic/fixbreakpoints(2)
$ edit frysk-core/frysk/hpd/BreakpointCommand.java
$ git-commit -a
$ git-checkout master(3)
$ git-pull origin
$ git-rebase master topic/fixbreakpoints(4)
$ git-checkout master(5)
$ git-merge topic/fixbreakpoints(6)
$ git-push orign

create a new branch from the head of master. You can use a path-like syntax in the branch names; it is common practice to put short lived local branches under the topic path.
switch to the new branch. If you are creating a new branch from the head of an existing branch like we just did, you can do it all with
git checkout -b topic/fixbreakpoints master

which creates a new branch and checks it out.
switch back to master and pull in updates
rebase the topic/fixbreakpoints branch onto the head of master. This checks out topic/fixbreakpoints.
get back to master.
merge changes from the topic branch to master. Because the topic branch is rooted at the current head of master, the merge can be done simply by changing the name master to point to the commit object at the head of the topic branch. This is a special kind of merge called a fast forward merge that is guaranteed not to cause any merge conflicts.

When you switch from one branch to another you will lose uncommitted work. It's customary to create a quick "work in progress" commit to save that work and then undo it later to when continuing with that branch; see the "Interrupted Workflow" example on the git-reset man page.. Git versions newer than 1.5.3 have a git-stash command that allows to to save uncommitted work when changing branches without creating a commit..

git-rebase is very powerful and can be used to graft chains of commits from one branch to another. It is smart enough to recognize if a commit in the branch being rebased already exists in the target branch and not cause a conflict. However, you shouldn't rebase branches whose head has been pushed to a public repository if you expect others to track that branch. rebase changes the history in a way that screws up later merges from the branch.

Names

So far we have used branch names, but there many other ways to name interesting commit objects:

HEAD is the name of the most recent object on the current branch
^ indicates the parent of an object. So HEAD^ is the parent of the most recent commit object on this branch, and HEAD^^ is its grandparent. If HEAD had several parents due to a merge, then they would be named HEAD^2, HEAD^3, etc.
~ is used to number previous generations of commits going through the first parent at each step. The direct ancestors of HEAD are HEAD~1, HEAD~2, HEAD~3, etc.
You can create new names for arbitrary commit objects using git-tag.
A few magic names like ORIG_HEAD (set when merging to the HEAD before the merge) exist.

See the documentation for git-rev-parse for the full story.

Backing Out Commits

You often need to "uncommit" commits, either because they were a bad idea, or they need further work. There's local way that resets the names of local branches and the indes, and a more formal way that creates the inverse of a particular commit.

To reset to two revisions back, blowing away all newer changes:

$ git-reset --hard HEAD~2

A more likely scenario is to undo a commit but leave the changes in your working tree so you can work on them further. git-reset with no options does that:

$ git-reset HEAD^

You can collapse several commits on a branch into one commit using git-reset in this way.

To create a commit that reverts another commit, and is suitable for pushing to a public repository, use the git-revert command.

Examining History

git-log prints out the commit log messages for an entire tree or a specific file.

git-diff is used to examine differences between the working tree and the index, as well as between arbitrary branches and commits. For example:

$ git-diff ChangeLog 1
$ git-diff HEAD^..HEAD ChangeLog 2
$ git-diff HEAD^ ChangeLog 3
$ git-diff master...topic/newbreakpt 4

difference between ChangeLog in the working directory and the index
differences in ChangeLog betwen the most recent revision and its predecessor.
shorter way of writing the same thing
changes in the topic/newbreakpt branch since it was forked from the master branch

Other Git Commands

git-cherry-pick picks a commit from anywhere and commits it to the current branch.

git-whatchanged can, among other things, search for commits that match a string or regular expression.

git-remote configures access to remote repositories and default actions when pulling from and pushing to them. You would use this command to track branches in repositories other than the one from which yours is cloned.

git-bisect will do a binary search through the revision history to zero in on the first "bad" commit. At each stage you tell it whether its current choice is "good" or "bad" before it chooses the next.

Importing on vendor branches and merging upstream sources

This is an example of importing a new libunwind version from upstream by putting it on the frysk vendor branch in git and then merging it to trunk.

Get the libunwind vendor branch $ git-checkout vendor/LIBUNWIND
Remove old stuff that needs to be replaced. $ rm -rf frysk-import/libunwind
Get upstream stuff and put it in (make sure you don't accidentially copy over the upstream .git dir) $ cp -r ~/upstream/libunwind frysk-import/
Commit (use a commit messsage like "Import of libunwind version libunwind-20071122") and push $ git-commit -a && git-push



The vendor branch is now updated, now we merge the result to the trunk.


 Switch to trunk
  $ git-checkout master
 Merge new vendor branch to the trunk
     $ git-merge vendor/LIBUNWIND
 Resolve the conflicts using git-rm for files not needed,
     editing files with conflict markers and git-add each one
     after resolving the issues.
 Do a full clean build and double check the test results.
     [.. lots of time passes...]
 Double check your patches (all should now be staged in the git index)
     with git-diff --cached
 Commit and push the result (git will have generated a merge message
     for you already that you can use in the commit).
     $ got commit && git push


If too much time has passed since the start the push will fail because
someone else will have pushed something already. Trying to do
a git fetch origin; git rebase origin seem to fail, so
you have to just do a git pull && git push then (which
creates an extra merge message, but that seems just fine).

Comments on how to improve this process appreciated.