Thorsten Stark

Migrating a large git repository to GitHub

We recently wanted to move our repository to GitHub. In theory that should be as simple as clicking the „Import Repository“ button. So we clicked that button and entered the necessary information for the migration tool. After only 5h hours of migration, the process failed with an „Error uploading commits“. No further information.

Using the command line to migrate the repository manually showed at least one reason why the upload failed. Our repository contained some files larger than 100 MB. Github has a hard limit when it comes to file sizes. Only files smaller than 100 MB are allowed to be stored in the git repository. Larger files must be stored in git LFS (Large File Storage). Usually, the GitHub migration tool should recognize those files and offer to move them to LFS or remove them from the repository. For unknown reasons, this didn't work for us. If this works for you: lucky you. After a few days struggling with the repository migration and some email exchange with the GitHub support, we figured out a way how to migrate our large repository to Github.

The short version: Clone your entire repository to your computer Move all large files to LFS Create a local working copy from the modified repository Upload every branch you need to the new remote (Github)

Prerequisite: As Git LFS isn‘t part of git itself you have to install it manually on your machine from the Git LFS page.

For everyone who doesn't know the concrete steps to do this (like me before) here comes the more detailed version:

Step 1 – Clone the complete repository

git clone --bare

This is nearly the same you do with every git repository you clone, but here the parameter --bare is an important difference. A normal git clone will create a so-called working copy of the repository on your computer and checks out the main branch i.e. master or develop. Adding --bare results in a full copy of the repository on your computer. Usually, it also has the extension .git This is the same format as on the remote host. But you can't work in this repository as you can only work in working copies (as the name suggests). Attention: After cloning the repository it would be wise to make a backup of this repository in case anything goes wrong especially because we are going to rewrite a great bunch of your git history in a few moments. An easy way to backup is to simply archive the folder called <your-repository-name>.git you just cloned.

Step 2 – Move big files to Git LFS

Download BFG Repo Cleaner from its website here. Next to other features, the one feature we are interested in is its capability to convert files to be stored in Git LFS. And this includes all files in the entire history of your git repository. But be careful, as mentioned earlier this rewrites the history of your repository, making it incompatible with the original version. In this example we want all file with the extensions jpg or mov to be moved to LFS:

java -jar path/to/bfg-x.y.z.jar --convert-to-git-lfs "{*.jpg,*.mov}" --no-blob-protection path/to/local/repository.git

After BFG has done its job we need to clean up a bit. The following command executes some kind of garbage collector to clean up the reflog and remove unused files from the repository.

cd path/to/local/repository.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

Next thing to do is to set up Git LFS and the required hooks by calling:

git lfs install

Step 3 – Upload modified repository

The best way now would be to just copy the whole modified repo to its new host. The required command would be this one here:

git push --mirror

Unfortunately, this didn‘t work for us so we had to come up with another solution.

Remember that the repository.git folder is the exact same format as your repository stored on a remote host. This means we can create a working copy from this file:

git clone path/to/local/repository.git

Now we have a checked-out version of our repository and can work on it like in any other cloned repository. We can change files, make commits or change/add the remote. Let's add a new remote pointing to our desired new repository. In this example, we name this new remote just newOrigin as the old remote is still there and is called origin by default. But you can choose any name you like. Just keep the old origin. We will soon see why...

git remote add newOrigin

The last step needs some more manual work. We will check out all relevant branches from origin and push/publish them to newOrigin – one by one.

Congratulations! 🎉 You just transferred your too large repository to GitHub!

Notice: it may be that some branches were not correctly modified by BFG. This may be caused by special characters in your branch names. You should not upload these branches to newOrigin because the content of this branch is not compatible with the other modified branches anymore. You recognize these branches when they take notably longer to upload than the other branches. Therefore, it helps when the first branch you upload is the master branch. So all other branches rely on commits already pushed with master, except for those branches not modified by BFG.

Tagged with: