Migrating a large git repository to GitHub
We recently wanted to move our repository to GitHub. In theory that should be as simple as clicking the „Import Repository“ button. So we clicked that button and entered the necessary information for the migration tool. After only 5h hours of migration, the process failed with an „Error uploading commits“. No further information.
Using the command line to migrate the repository manually showed at least one reason why the upload failed. Our repository contained some files larger than 100 MB. Github has a hard limit when it comes to file sizes. Only files smaller than 100 MB are allowed to be stored in the git repository. Larger files must be stored in git LFS (Large File Storage). Usually, the GitHub migration tool should recognize those files and offer to move them to LFS or remove them from the repository. For unknown reasons, this didn't work for us. If this works for you: lucky you. After a few days struggling with the repository migration and some email exchange with the GitHub support, we figured out a way how to migrate our large repository to Github.
The short version: Clone your entire repository to your computer Move all large files to LFS Create a local working copy from the modified repository Upload every branch you need to the new remote (Github)
Prerequisite: As Git LFS isn‘t part of git itself you have to install it manually on your machine from the Git LFS page.
For everyone who doesn't know the concrete steps to do this (like me before) here comes the more detailed version:
Step 1 – Clone the complete repository
git clone --bare git@hoster.com:username/repository.git
This is nearly the same you do with every git repository you clone, but here the parameter --bare
is an important difference. A normal git clone
will create a so-called working copy of the repository on your computer and checks out the main branch i.e. master
or develop
. Adding --bare
results in a full copy of the repository on your computer. Usually, it also has the extension .git
This is the same format as on the remote host. But you can't work in this repository as you can only work in working copies (as the name suggests). Attention: After cloning the repository it would be wise to make a backup of this repository in case anything goes wrong especially because we are going to rewrite a great bunch of your git history in a few moments. An easy way to backup is to simply archive the folder called <your-repository-name>.git
you just cloned.
Step 2 – Move big files to Git LFS
Download BFG Repo Cleaner from its website here. Next to other features, the one feature we are interested in is its capability to convert files to be stored in Git LFS. And this includes all files in the entire history of your git repository. But be careful, as mentioned earlier this rewrites the history of your repository, making it incompatible with the original version. In this example we want all file with the extensions jpg
or mov
to be moved to LFS:
java -jar path/to/bfg-x.y.z.jar --convert-to-git-lfs "{*.jpg,*.mov}" --no-blob-protection path/to/local/repository.git
After BFG has done its job we need to clean up a bit. The following command executes some kind of garbage collector to clean up the reflog and remove unused files from the repository.
cd path/to/local/repository.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
Next thing to do is to set up Git LFS and the required hooks by calling:
git lfs install
Step 3 – Upload modified repository
The best way now would be to just copy the whole modified repo to its new host. The required command would be this one here:
git push --mirror git@github.com:username/new-repository.git
Unfortunately, this didn‘t work for us so we had to come up with another solution.
Remember that the repository.git folder is the exact same format as your repository stored on a remote host. This means we can create a working copy from this file:
git clone path/to/local/repository.git
Now we have a checked-out version of our repository and can work on it like in any other cloned repository. We can change files, make commits or change/add the remote. Let's add a new remote pointing to our desired new repository. In this example, we name this new remote just newOrigin
as the old remote is still there and is called origin
by default. But you can choose any name you like. Just keep the old origin. We will soon see why...
git remote add newOrigin git@github.com:username/new-repository.git
The last step needs some more manual work. We will check out all relevant branches from origin
and push/publish them to newOrigin
– one by one.
Congratulations! 🎉 You just transferred your too large repository to GitHub!
Notice: it may be that some branches were not correctly modified by BFG. This may be caused by special characters in your branch names. You should not upload these branches to newOrigin
because the content of this branch is not compatible with the other modified branches anymore. You recognize these branches when they take notably longer to upload than the other branches. Therefore, it helps when the first branch you upload is the master branch. So all other branches rely on commits already pushed with master, except for those branches not modified by BFG.