Visualising SVN

We've been doing some work recently to try and be a bit more clever about what areas of the product we focus our testing on for each release.  This meant we needed to actually be able to see which areas of the product had been worked on during that release cycle and how many files/changes had been made.  Thanks to the power of google I found a couple of open source projects that allow you to visualise your SVN repository and any changes that have been made; Gource and Code Swarm

Gource is tool written by Andrew Caudwell for visualising your SVN check ins.  Here's a video showing what you what you can expect to get which was run against the Flickr codebase and shows 7 years of commits up until 2010.  It focuses on what has changed as opposed to who changed it.

Code Swarm on the other had focuses on who is making the changes and what kinds of files they are changing, with swarms of file changes forming aroung the users that made those changes.  Again there are quite a few big names who have released examples of this, I've picked one by Twitter which includes a number of projects written in different languages which clearly shows the changes to different code file types by different programmers.

Here's how to create your own visualisations in both Gource and CodeSwarm.  As we use SVN here I'll be walking through how to do this with an SVN repository but it's possible to do this with other repositories such as GIT

First up Gource.

You will need to get an xml log of the repository changes that you want to visualise.  You can do this by running the following command

svn log <a href="http://your-repositoy/svn">http://your-repositoy/svn</a> -r 1:HEAD --verbose --xml &gt; somefile.xml

This way you can actually choose to only look at a particular range of revisions or only a particular subset of your repository by modifying the options on this command.  You will then need to install Gource from http://code.google.com/p/gource/.  Just download the zip file and unzip it into a location of your choice.  Once this is done take the svn xml file you created earlier, navigate to the folder you unzipped Gource to and run this command.

gource.exe somefile.xml

This should open up your own visualisation.  Sit back and enjoy or try getting Code Swarm working.

The only other flag that I've found I use a lot is the -i option.  This sets the amount of time that a file will remain in the visualisation before they are removed.  Due to the way we want to use Gource we set this to 0 so that once the files are modified they won't fade out.  There are a number of other settings that you might find useful that are documented in the readme found in the root of the Gource folder.

Now onto Code Swarm.

Download CodeSwarm from http://code.google.com/p/codeswarm/.  You will need to install Ant and Java in order to run it and python if you want to run any of the helper files.  Specifically the 2 branch as opposed to 3 as python 3 is not backwards compatible with 2 which we won't get into here.  I also won't go through how to install these now as there are plenty of resources available describing the process.

The simplest way of running Code Swarm is to then run runrepositoryfetch.bat which will allows you to enter a repository name and credentials and will then kick off a video representing that repository.  Just hit enter when it asks which config file to use.  If that's all you want then again sit back and enjoy but if you want to get slightly more involved though and choose specific revisions or areas of the repository then you will need to read on.

Again you want to get a log from your repository, however, this time you do not need an xml log.  Well actually you do but Code Swarm expects it to be it's own particular format so we have to first get an svn log file and then convert it.  Start by gettting a log file from SVN with the following command.

svn log <a href="http://your-repositoy/svn">http://your-repositoy/svn</a> -r 1:HEAD --verbose &gt; somefile.log

In order to convert the log file into the xml format that Code Swarm is expecting they provide a python script.   Run this command from the convert_logs folder giving it the file that you just created and the output file and it should go ahead and spit out an xml file for you.

python convert_logs.py -s somefile.log -o somefile.xml

Now we have the SVN log in the format that Code Swarm expects we're almost there, you will just need to create a config file to tell Code Swarm to use the file that you've jsut created.  The easiest way to do this is take a copy of the sample.config file in the data folder and then copy this and rename it.  Open it up and at a minimum you will want to change the line which tells Code Swarm which log file to use.  It should look something like this

InputFile=data/sample-repevents.xml

Replace the data/sample-repevents.xml with the location of your file and then save and close the file.  Now simply run run.bat from the root of the Code Swarm directory and it should use your newly generated xml file to create your very own code swarm.  Now you really can sit back and enjoy, unless of course you're wondering what all the other settings that we just skipped over were...  The only one I'll mention briefly here that you will probably want to change is the types of code files that are visualised.  You may have seen some lines liek this

ColorAssign1="Docs",".*doc.*", 0,0,255, 0,0,255

This tells Code Swarm how to colour code your documents.  So if you want to separate ruby from your php or html then you will want to edit these lines.  The simplest change is to replace the doc in this line with the file type of your choosing.  Here we'll do php to keep it simple, so the previous line becomes this

ColorAssign1="Docs",".*php.*", 0,0,255, 0,0,255

Now we've also visualised the difference in changes in our php documents from all our other documents.