Wondering how to migrate a Subversion repository to a Git repository? If so, this is the blog for you. We will cover that and various issues that can be encountered. The steps discussed here are basically just a summary of BitBucket’s step-by-step procedure for doing the migration and rely on some tools/scripts provided by Bitbucket (BB). However, in some situations there may be additional preparation of an SVN repository that is required before the BB steps will “just work”. That is discussed towards the end of this article.
NOTE: The migration must be done on a case-sensitive file system. Linux is fine, Windows is out, and OSX is do-able. For OSX, you can use the BB scripts to mount a case-sensitive file system by calling: java -jar svn-migration-scripts.jar create-disk-image <X> GitMigration. Where <X> is the size of the disk image to create, in Gigabytes (make sure it’s big enough).
NOTE: You can have multiple branch paths specified.
NOTE: This differs from the instructions on BB. The instructions on BB do not include specifying the prefix as empty. Git changed its default behavior from 1.x to 2.x. In 1.x the default prefix when doing an svn clone was to use an empty prefix. In 2.x the default is to add a prefix named “origin”. This does 2 things. First, it screws up later BB tools for converting SVN branches to Git branches (although there is technically a different way to fix that problem). Second, it will add the prefix “origin/” to the name of all of your branches and tags once they are converted to Git branches and tags. Whether or not it is desirable to have the “origin” prefix is up to you, but I decided I would rather not have the extra clutter.
NOTE: Remember to put the git repo in a case-sensitive file system (see above)
The git utilities turn your SVN branches and tags into git remote branches and tags pointing to a remote with the name specified in the prefix (see above). This is useful for continuing to maintain the SVN repository, but not as useful if you want to do a 1-way conversion. Hence why I dropped the prefix because when doing a 1-way conversion it just adds extra clutter. The BB utilities will help you convert the remote branches and tags (pointing back to SVN) into local branches and tags that then look like they were always done in Git, to begin with.
NOTE: If you didn’t specify an empty prefix in the git svn clone step, then you will first see all your branches converted/created first and then at the end the script will delete all of the created branches. It also seems to screw up the tags as well. You can get around this by passing the “–no-delete” option after the force option. However, this still doesn’t fix issues with tags. Also, the conversion will keep all branches ever made. You’ll then have to go into Git and manually delete all branches that were already merged back in the SVN repository and deleted.
The above steps should work fine for both standard SVN layouts and simple, non-standard layouts where the location of the trunk, branches, and tags folders never changed. If you ever changed, moved, or renamed the trunk, branches, and tags folders you are going to have serious issues with the conversion. At this point, it is a good idea to consider whether or not you really need to maintain the history or if it would be ok to just start a git repository fresh from the latest SVN version as if it was the first commit into git. However, if you really want to maintain history, there is a way to do it – described here.
Basically, you need to “massage” your old SVN repository data into a new SVN repository structure that looks as-if you started with the standard trunk, branches, and tags layout and never changed it. The basic outline goes as follows:
When you use an include filter, the process will end up with empty revision numbers for the revisions that didn’t affect your project. It’s generally better to get rid of those. However, if you refer to specific SVN revision numbers anywhere (e.g. commits, documentation), then you will not preserve that information. However, I would recommend it if you can live without them.
First, make a backup of the starting dumpfile. You will likely need to make multiple attempts at correcting all the issues you want and creating the dumpfile can sometimes take a long time, depending on the size of your repository.
Next, start making a script that is going to use the Linux/Unix “sed” command to find and replace text in the dumpfile that will correct things so it looks like your repository always just had the plain old trunk, branches, and tags structure.
There are 2 types of lines we need to start replacing in the SVN dumpfile:
The first type is related to lines in the dumpfile that are basically telling SVN where the operation is occurring.
An example of how this might get used in the dump file is:
A section like this is what you would see for an initial commit that creates the trunk folder in the root directory structure.
The second type of line (the copyfrom) occurs when you move a file or folder in SVN. You might see something like:
A section like this is what you would see when you moved file1.h from folder2 to folder1. The SVN dumpfile keeps track of where and at what revision the file came from.
Let’s say that you have a project that you want to migrate that is in a subfolder of a “master” repository. In this particular case, you can use the built-in “git svn clone” options to just specify the paths to the trunk, branches, tags, but let’s instead use editing the dump file to achieve the same thing. Let’s suppose that in your repository you have the following structure:
You’ve first already created a dumpfile that only includes these folders (see above). Now we want to edit the dumpfile so it looks like none of the commits ever knew about the top-level folder.
We do that by doing the following in the script file:
sed -i ‘s/Node-path: Project\/trunk/Node-path: trunk/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Project\/trunk/Node-copyfrom-path: trunk/g’ <dumpfile>
sed -i ‘s/Node-path: Project\/branches/Node-path: branches/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Project\/branches/Node-copyfrom-path: branches/g’ <dumpfile>
sed -i ‘s/Node-path: Project\/tags/Node-path: tags/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Project\/tags/Node-copyfrom-path: tags/g’ <dumpfile>
What we are doing is replacing the paths to every single folder/file in every single commit in the dumpfile with a new path that strips the first folder out. We need to also include the “copyfrom” versions so that when things are moved, it matches the new directory structures we are creating.
Now, that is a relatively trivial example. But you can fix other much more non-trivial issues. Let’s say you actually started with a root trunk/branches/tags and then you moved it into a sub-folder. So this:
Got converted to this with some folder SVN-moves:
This is where things start to get complicated. Using “svn git clone” you cannot specify that the trunk got moved around. Now, you must fix it in the dumpfile first. The good news is that you can do this. The script to do this is actually the same as the script above for the trivial case. However, there are now some additional tweaks we need to make to the dumpfile.
Things can get even more complicated than this as well. Imagine if you moved a file to a location that was filtered out in the original dump because it’s not part of the project you want to recreate. But then the file got moved back into the project that you do want to recreate. In those cases, you actually need to create a dump that contains the parts that you don’t (eventually) want, at least for now. One suggestion is to move those folder locations using the tricks above to a temporary folder in the new “trunk”, as in “trunk/svn-git-migration/”, which is a folder you will delete later. This will allow you to preserve the file moves during the recreation of the repo. After you make the new repo (see below), you can then SVN-delete the temporary folder right before you convert to git.
You can also use these tricks to deal with externals. This section describes how to handle when you moved some stuff around in a “master” repo to share it between some projects. Git has submodules and subtrees if you want to preserve and SVN “externals” as its own repo but let’s suppose that instead, you want to make it look like the contents of the externals file never moved around at all, so you can create one new repo with all the history of the externals too. Let’s suppose you started with:
Then, you took some stuff from Project1 and started moving it to a shared section in your repo (to share with Project2) and added an externals folder in your trunk to get back into Project1. So now your repo looks like:
You can use the same path replacement techniques to make it look like all you did was reorganize some files that were in other areas of “trunk” into a new subfolder you named “Library”. Just do the following with some sed commands:
sed -i ‘s/Node-path: Library\/trunk/Node-path: trunk\/Library/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Library\/trunk/Node-copyfrom-path: trunk\/Library/g’ <dumpfile>
There is one other thing that you need to do though. Since “trunk/Library” is created via an externals, the actual “Library” folder is never created in the dumpfile. So when recreating the repo, it’s going to fail when it tries to move stuff into the Library folder because it doesn’t exist yet. You will need to go into the dumpfile and manually “add” a directory in an appropriate existing commit (probably the one right before you need to use it). This is similar to how we had to manually delete some parts of the dumpfile discussed above. You can do this with something like:
You generally only need to do this for the “trunk”, however, this may not completely preserve what the state of the Library was in any branches you created. You probably fixed the revision of Library in those branches. You could (if you want) continue using these tricks to fix it all up.
Things can also get tricky sometimes regarding the order of the path replacement operations. For example, consider a folder structure that evolved as such:
If you first try to replace “Project” with trunk, it will also replace “Project-trunk” with “trunk-trunk”. That will screw up future replacement operations. So, generally speaking, do the replacement operation from most-specific to least-specific. In this example, the order might be: