Notes on using Git
frysk has changed from CVS to Git for its source control system. This page is meant to ease the transition for new Git users and point the way towards the more advanced usage that motivated the switch in the first place.
The Git manual pages can be a bit opaque, but there are several good sources of information about Git on the Web.
Git Concepts for CVS Refugees
You can use Git in a similar way to CVS and ignore many of its features, but if you know a little background information about Git and its repository structure you will be able to start doing more advanced operations and get out of trouble if you mess something up. This chapter in the Git manual tells the whole story, but a quick introduction will get you started.
The Git repository stores the history of a project as a chain of revisions which describe changes (file and directory additions, deletions, and modifications) to the project's source tree. These revisions, called commit objects, are named by an SHA1 hash over the contents of the commit, the date and author, and the SHA1 hashes of the commit's parents. It is important to note that two commits, whose contents (diff) are identical and that were based on identical "snapshots" of the source tree, will have different SHA1 names if the history that preceeded them is different.
When you start work on a project that is in a remote git repository, you clone that repository; that is, you copy all the commit objects to your machine. The "checked out" copy of the sources where you do your work is called the working tree. As you work on the project you create new commit objects with your changes. You update the remote repository by publishing your new commit objects to it. One way to publish is by emailing your new commit objects to a centeral maintainer (e.g., Linus) who either commits them to the remote repository, rejects them, or hacks on them locally and then commits new ones. In the frysk project we allow developers to have write access to the remote repository and publish the commits themselves.
The chain of commits is actually a directed acyclic graph, supporting branches of development. A commit object can have multiple children at a branch point; similarly, a commit can have multiple parents where branches merge back together. In Git a branch name is not much more than a nickname for the commit object on the end of a chain of commit objects.
A Git repository contains other auxiliary data structures to speed up common operations. The most important one is the index, which supports very fast diff operations over the entire tree. When you create a new commit, you first have to update the index with the content of the new commit. In CVS you have to cvs add new files, but not new changes; in Git you have to git-add every change, including file modifications. In practice you don't have to deal with it that much because the git-commit command has a flag to automatically add changed files to the index before a commit, though you do need to explicitly add new files. Also, when the Git commands that do merges fail because of a conflict, they put conflict markers in the offending files and exit. After editing the files and resolving the conflict, you signal that the conflict is resolved by adding the changes to the index. You can then restart the merge command and continue.
Version 220.127.116.11 of Git is currently available in yum for Fedora Core 6, but it is recommended that you download 18.104.22.168 or later from kernel.org.
The frysk Git Repository
You can see a view of the frysk Git repository here. Like most Git repositories, It contains a branch called master which is the official "mainline" of development. It also has a branch called cvs-sync which is updated semi-automatically from CVS checkins. When we switch over from CVS to Git for real the cvs-sync branch will stay frozen at the last commit derived from CVS. Developers are encouraged to create their own branches for learning Git and for publishing long-lived lines of development that aren't ready for master.
Let's start hacking. Clone the repository with
If you're not a frysk developer, you can use the URL git://sources.redhat.com/git/frysk.git instead.
In the frysk subdirectory you will find a checked-out frysk tree. The Git repository is in the hidden directory .git. What state are we in?
$ git-status # On branch master nothing to commit (working directory clean)
The gitk --all command is useful for getting a graphic representation of the branch and history structure of the repository. Using it we see that our master branch is branched from remotes/origin/master, which is a remote branch. We have a copy all the commit objects from that remote branch in our repository and we get fetch updates to it, but we can never switch to a remote branch and make actual changes there.
Make some changes and commit them locally:
$ edit newfile.java (1) $ edit ChangeLog (2) $ git-add newfile.java (3) $ git-commit -a (4> $ git-pull origin (5) $ git-push origin (6)
In step 5 we could have gotten a merge conflict if we had touched files that had been updated in the remote repository. In that case we would look for the conflict markers (like in CVS), edit them, add our changes with git-add, and make a commit with git-commit.Git will fashion a special merge commit for us.
In step 6 the push would have failed if our copy of the master branch wasn't up to date with the remote branch. In that case we would have done a git-pull, resolved any merge conflicts, and tried again.
Branches and Rebasing
In the above example we made changes directly to the master branch, but it's generally desirable to create a local branch for all but the most trivial changes. In this way you can work on several bug fixes and other development at once. Also, in the above example, if there had been a conflict at the pull step, our merge commit would have been pushed to the remote repository where it would be an ugly distraction. By doing work in a local branch and then rebasing that onto the current upstream state we can commit a very clean series of commits to the upstream repository.
First, let's make our change on the master branch and rebase that onto the remote branch:
$ edit frysk-core/frysk/hpd/BreakpointCommand.java $ git-commit -a (1) $ git-fetch origin (2) $ git-rebase origin (3) $ git-add frysk-core/frysk/hpd/BreakpointCommand.java (4) $ git-rebase --continue $ git-push origin
Here is the same work, done on a branch:
$ git-branch topic/fixbreakpoints master(1) $ git-checkout topic/fixbreakpoints(2) $ edit frysk-core/frysk/hpd/BreakpointCommand.java $ git-commit -a $ git-checkout master(3) $ git-pull origin $ git-rebase master topic/fixbreakpoints(4) $ git-checkout master(5) $ git-merge topic/fixbreakpoints(6) $ git-push orign
When you switch from one branch to another you will lose uncommitted work. It's customary to create a quick "work in progress" commit to save that work and then undo it later to when continuing with that branch; see the "Interrupted Workflow" example on the git-reset man page.. Git versions newer than 1.5.3 have a git-stash command that allows to to save uncommitted work when changing branches without creating a commit..
git-rebase is very powerful and can be used to graft chains of commits from one branch to another. It is smart enough to recognize if a commit in the branch being rebased already exists in the target branch and not cause a conflict. However, you shouldn't rebase branches whose head has been pushed to a public repository if you expect others to track that branch. rebase changes the history in a way that screws up later merges from the branch.
So far we have used branch names, but there many other ways to name interesting commit objects:
Backing Out Commits
You often need to "uncommit" commits, either because they were a bad idea, or they need further work. There's local way that resets the names of local branches and the indes, and a more formal way that creates the inverse of a particular commit.
To reset to two revisions back, blowing away all newer changes:
$ git-reset --hard HEAD~2
A more likely scenario is to undo a commit but leave the changes in your working tree so you can work on them further. git-reset with no options does that:
$ git-reset HEAD^
You can collapse several commits on a branch into one commit using git-reset in this way.
git-log prints out the commit log messages for an entire tree or a specific file.
git-diff is used to examine differences between the working tree and the index, as well as between arbitrary branches and commits. For example:
$ git-diff ChangeLog 1 $ git-diff HEAD^..HEAD ChangeLog 2 $ git-diff HEAD^ ChangeLog 3 $ git-diff master...topic/newbreakpt 4
Other Git Commands
git-cherry-pick picks a commit from anywhere and commits it to the current branch.
git-whatchanged can, among other things, search for commits that match a string or regular expression.
git-remote configures access to remote repositories and default actions when pulling from and pushing to them. You would use this command to track branches in repositories other than the one from which yours is cloned.
git-bisect will do a binary search through the revision history
to zero in on the first "bad" commit. At each stage you tell it whether its
current choice is "good" or "bad" before it chooses the next.
This is an example of importing a new libunwind version from upstream by putting it on the frysk vendor branch in git and then merging it to trunk.
The vendor branch is now updated, now we merge the result to the trunk.
If too much time has passed since the start the push will fail because
someone else will have pushed something already. Trying to do
Comments on how to improve this process appreciated.