On a software project, sometimes it’s useful to get an overview of some project metrics. For small or short-lived projects it isn’t much trouble to look at a few columns of numbers, but on a larger project it can be hard to see what’s going on with only the numbers. If only we could use our species’ well developed visual reasoning instead.
This is where treemaps can help.
I Think I See The Problem
Most people I’ve talked to about treemaps were introduced to them through a disk utility which would show you the relative size of files on your hard disk, and that’s an easy way to understand them. Each file’s representation in the treemap is a rectangle whose area is proportional in size to the number of bytes that file takes on disk. It’s very easy to see where your disk storage is being used, or where the most bugs are being found, or where you need to beef up the source code commenting, and so on. Adding color lets you view even more information. Just map size to line count and degree of color to bug count and look for the large red boxes. If your data has a natural hierarchy, those hierarchy’s levels can be represented in the treemap. The gray boxes in the image up top represent a level. Anything below the gray box is a child of a node in that level. In other words, trees.
The Tools
I found a good way to produce a treemap in Google’s Visualization: TreeMap API. With some boilerplate JavaScript and HTML you can create an interactive treemap in your browser, and best of all your data never leaves the security of your own computer. (Some other chart types may send the data to Google, check the docs if you are concerned.)
It helps to have some support in calculating the metrics. I tried CCCC which has a promising complexity metric, but it had trouble parsing my modern C++ files. (I’m still looking for an analysis tool. Hasn’t someone done this with LLVM/clang yet?) I have also mined the information in my project’s version control system. I always put task or bug numbers in my commit messages, and I used those to count the number of bugs per file.
An Example
To do my bug analysis I wrote a Python script to scrape Subversion’s log output for bug numbers, count the unique numbers per file, and do a line count with ‘wc -l’ on each .cpp and .h file in my repository. I print the output of that function to inject the data in some HTML that produces the chart.
# Start the HTML and JavaScript code
print '''
<html>
<head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["treemap"]});
google.setOnLoadCallback(drawChart);
function drawChart() {
// Create and populate the data table.
var data = google.visualization.arrayToDataTable(
# Insert the table entries by printing the list of lists returned from CountBugs
print CountBugs('path/to/my/repo')
# Finish the HTML and JavaScript
print '''
);
var tree = new google.visualization.TreeMap(document.getElementById('chart_div'));
tree.draw(data, {
maxDepth: 2,
minColor: 'YellowGreen',
<em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> midColor: 'LightGoldenRodYellow',
</em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> maxColor: 'Red',
</em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> headerHeight: 15,
</em></em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> fontColor: 'black',
</em></em></em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> showScale: true});
</em></em></em></em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> }
</em></em></em></em></em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> </script>
</em></em></em></em></em></em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> </head>
</em></em></em></em></em></em></em></em></em></em></em></em></em></em><em id="__mceDel"><em id="__mceDel"> <body>
</em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> <div id="chart_div" style="width: 900px; height: 500px;"></div>
</em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"> </body>
</em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"></html>
</em></em></em></em></em><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel"><em id="__mceDel">
'''
</em></em></em></em></em></em>
Below is some example data you could plug in if you want to get experimenting quickly. The first item contains the labels for each column. I’ve limited the depth of my script’s search for source files to one subdirectory, and my rows need a file name and a parent. In general the list must include one entry for each level in the data hierarchy. That’s followed by two data points, one for size and one for color. One tricky thing to remember is that each level, in my example the root and one subdirectory, also needs an entry in the list.
This following example shows just a few lines of my analysis. Root is the top level of the tree. Screens is a directory. Firmware.cpp is in the top level of the repo, and AboutScreen.cpp is in the Screens directory.
[['File', 'Directory', 'LOC (size)', 'Bugs (color)'],
['root', null, 0, 0],
['Screens', 'root', 0, 0],
['Firmware.cpp', 'root', 1258, 23],
['AboutScreen.cpp', 'Screens', 116, 0]]
There are a few options in the treemap API such as the depth of the tree, and color ranges to use. You can see my choices in the script above. Of course you can add JavaScript to handle clicks on the map with your own code. I think it’s fun to play with this stuff and you may find some other chart types will work as well for you and your data.