Oct 24 2010

Multiple working folders with single GIT repository

The problem

As much as I love GIT, there are several design decisions that make my everyday work really difficult. One of such design flaws is making the repository and the working folder tightly coupled. It is a serious problem for every developer who is frequently switching between several product/topic branches of the same project.

Git gives you two options to deal with this:

Create different local clones of the repository and checkout different branch in every one of them.
Do all your work in one local repository, every time you have a task switch ether stash or commit your changes on the current branch and checkout a different one.

As I quickly learned both solutions were not very practical for what I was doing. As a linux kernel developer in a company that actually ships multiple linux based products, I eventually found myself working and supporting several product branches, sometimes based on different versions of the kernel. One branch could be plain vanilla kernel.org kernel based on 2.6.32, another one could be an old product running of 2.6.24 with tons of additional architecture specific code that came from a vendor, etc. Both approaches had very serious drawbacks:

Cloning a kernel repository for each product/branch would take more disk space (every copy of kernel repository is about 500MB), even though disk space is cheap today, you have to admit it doesn’t make sense to keep several copies of the same binary blob on your disk (UPDATE 10/25/2010: That is actually not true, cloning locally from the existing repository is done via hard links, so no extra space is wasted). But the biggest problem is synchronizing all these local repositories. If I need to merge branches or cherry-pick individual commits between them I end up pushing and pulling the changes through a shared remote repository, or connecting these local repositories as remotes to one another.
Doing all your work in one repository is not an option either, if you ever tried to checkout a 2.6.24 linux kernel branch while already having 2.6.35 in your folder you’d understand why, it takes time. When you are multi-tasking you usually try to minimize the overhead of each switch and having to stash your current work or create dummy commits every time you switch doesn’t seem like the right way to do it. Another drawback to this approach is there is no convenient way to do a nice visual diff between files and folders on two branches, yes, it’s doable, but it is not as easy as running “meld folderA folderB”.

The solution

I’m not going to take credit for the method I am going to show you, I wasn’t the one who came up with it, Ed, one of the guys on my team did. Ed actually developed whole set of useful scripts for kernel development involving frequent switching between product branching, but the part about branching and multiple working folders is applicable to other projects as well, so that’s what I want to share with you.
Actually the trick is quite trivial, we use our knowledge of git internals and unix symbolic links to achieve what we want. All we wanted is to have one repository and multiple working folders associated with it, so lets do just that. That is how our final folder structure will look like:

project
+-.repo
  +-.git
    +-branches
    +-config
    +-description
    +-HEAD
    +-hooks
    +-objects
    +-info
    +-packed-refs
    +-refs
+-branchA
  +-.git
    +-branches -> ../../.repo/.git/branches
    +-config -> ../../.repo/config
    +-description
    +-HEAD
    +-index
    +-hooks -> ../../.repo/hooks
    +-objects -> ../../.repo/objects
    +-info -> ../../.repo/info
    +-packed-refs -> ../../.repo/packed-refs
    +-refs -> ../../.repo/refs
+-branchB
+-branchC
...
etc

We created one wrapper “project” folder, one master GIT repository in .repo and a bunch of branchX folders. Each one of these folders is a legal GIT folder by itself, however it doesn’t keep its own config and objects/refs repositories, it links to the one in .repo instead. It does keep a private local version of HEAD, index (staging area) and the whole working folder. Every manipulation on the database performed in any of the branch folders is immediately visible in others (commits, branches, tags, remote changes), context switching is just a matter of changing folders now.

This is how we initialize this structure for the first time time:

init.sh

mkdir .repo
pushd .repo
git init
git remote add origin git://some.remote.url.../project.git
git fetch origin
popd
newfolder.sh branchA
pushd branchA
git checkout -b branchA origin/branchA
popd
newfolder.sh branchB
pushd branchB
git checkout -b branchB origin/branchB
popd

The script we used to automate creation of a new working folder (newfolder.sh). We can call it as many times as we want an whenever we need to create a new folder.

newfolder.sh

if [ "$1" == "" ]; then
	echo "Need to specify target."
	return
fi
TARGET=$1
mkdir $TARGET
pushd $TARGET
git init
pushd .git
for i in branches config hooks info objects refs packed-refs ; do
	rm -rf $i
	ln -sf ../../.repo/.git/$i $i
done
popd # .git
popd # $TARGET

Enjoy! I find this method very useful already, but I’ll be glad to hear some feedback since I am sure someone will come up with a way to polish it and make it even better.

Update 10/25/01: I wrote the post yesterday, and today I accidentally found out that the trick described above for a while has been a part of a standard git distribution and can be found in contrib/workdir/git-new-workdir :)