Rewriting history (of flashrom)

Sorry to interrupt the stream of GSoC posts! 😉 Some of you may have wondered why there were no flashrom commits at all since the 0.9.9 release in March (i.e. about 5 Months). There wasn’t much direct flashrom development going on on the vanilla branch of flashrom apart from Hatim’s GSoC efforts of course.

Instead I’ve been working on moving our repository to git. I have created some client- and server-side hooks + other infrastructure (with the help of Patrick) that will be used in the future to host the vanilla flashrom repository. This part was actually only the smaller part of time I spent on flashrom in the last months (which was not as much as I would have wished for, obviously). More on this and future development later on the flashrom mailing list…

The remaining time I’ve been working on a script that rewrites history. Not all of history, only the flashrom repository’s history.The new git repo will include not only the commits from the trunk branch of the well-known subversion repository but also the fameless flashrom-related commits of the original coreboot v1 (then LinuxBIOS) repository where flashrom was conceived initially. This comprises about 30 commits from 2002 to fall 2003. The first commit message (by Ron Minnich) was “Trying to make this general purpose user-land flash burner” on 2002-01-29.

Thanks to git-svn (which I’ve been using for almost all my flashrom development from the start) I already got all of flashrom’s subversion repository in a local git database. Prepending that other tree was quite easy with git:
Suppose you have currently checked out the tree you want to append to the old tree in a branch called flashrom-cbv1/master then the following will replace the initial commit of the current branch with the other tree (faster than svn can show a single commit message):

git replace $(git rev-list --max-parents=0 HEAD) flashrom-cbv1/master

So that was relatively easy…

But there were some problems. Most importantly git stores an author and a committer for each commit. These were not set correctly in my tree but was derived from the svn committer’s user name (e.g. stefanct). To correct this one can either use an author file that maps the svn committers to names+email addresses like git uses, or one can use the git filter-branch command to change existing commits. This also allows to derive more precise information… my final script is parsing the commit message for signed-off-by lines and uses that to set the author to something else than the svn committer. This will dramatically change the perception of flashrom as it changes the number of commits by the regulars as well as the number of total contributors as decoded e.g. by github or openhub quite a bit, cf. https://www.openhub.net/p/flashrom/contributors

If you look at the very old commit logs of flashrom in the subversion tree you will notice many that well… offer some room for improvement. For example all of the commits converted from the coreboot v2 tree start with “Original v2 revision:” followed by the coreboot svn revision number – not that helpful or pretty. The quality of the actual contents vary a lot. Many don’t have a sign-off, many are just bad commit messages like “flasrom update from Stefan, resovle issue 21” but some are actually wrong as they describe another commit or something that might have happened in the same commit (of the coreboot tree) that did not touch flashrom but got committed with changes to flashrom. But even newer commits of flashrom are far from perfect and could be improved, some even automatically.

Perfectionistic as I tend to be I could not abstain from seizing the opportunity to correct these problems. So I began to work on a commit filter script to be used with git filter-branch that improved things quite a bit. Due to my lack of awk-knowledge I resorted to use sed, lots of greps and even pcregrep and perl in some circumstances to parse and change the commit messages. Eventually I came up with a script that not only fixes lots of typos automatically, sets the git author and committer correctly and also preserves the knowledge about cb v1, v2 and flashrom svn revisions but also fixes many individual commit messages “manually” by rewriting and/or deleting parts of them. All in all this script has over 1000 lines of shell script and runs for about 500 seconds/8 minutes when rewriting the whole flashrom history.

Sorry that it took so long but the result is quite nice IMHO. Take a look at https://www.flashrom.org/git/flashrom.git/ and please notify me of any obvious mistakes my script made if you spot them!

Since this was quite some effort and the script contains some maybe not obvious tricks for this kind of task, I have attached the current version to this post: svn2git.sh