Heat Check

I Think I See The Problem

Most people I’ve talked to about treemaps were introduced to them through a disk utility which would show you the relative size of files on your hard disk, and that’s an easy way to understand them. Each file’s representation in the treemap is a rectangle whose area is proportional in size to the number of bytes that file takes on disk. It’s very easy to see where your disk storage is being used, or where the most bugs are being found, or where you need to beef up the source code commenting, and so on. Adding color lets you view even more information. Just map size to line count and degree of color to bug count and look for the large red boxes. If your data has a natural hierarchy, those hierarchy’s levels can be represented in the treemap. The gray boxes in the image up top represent a level. Anything below the gray box is a child of a node in that level. In other words, trees.

The Tools

I found a good way to produce a treemap in Google’s Visualization: TreeMap API. With some boilerplate JavaScript and HTML you can create an interactive treemap in your browser, and best of all your data never leaves the security of your own computer. (Some other chart types may send the data to Google, check the docs if you are concerned.)

It helps to have some support in calculating the metrics. I tried CCCC which has a promising complexity metric, but it had trouble parsing my modern C++ files. (I’m still looking for an analysis tool. Hasn’t someone done this with LLVM/clang yet?) I have also mined the information in my project’s version control system. I always put task or bug numbers in my commit messages, and I used those to count the number of bugs per file.

An Example

To do my bug analysis I wrote a Python script to scrape Subversion’s log output for bug numbers, count the unique numbers per file, and do a line count with ‘wc -l’ on each .cpp and .h file in my repository. I print the output of that function to inject the data in some HTML that produces the chart.

# Start the HTML and JavaScript code print ''' <html> <head> <script type="text/javascript" src="https://www.google.com/jsapi"></script> <script type="text/javascript"> google.load("visualization", "1", {packages:["treemap"]}); google.setOnLoadCallback(drawChart); function drawChart() { // Create and populate the data table. var data = google.visualization.arrayToDataTable(

# Insert the table entries by printing the list of lists returned from CountBugs print CountBugs('path/to/my/repo') # Finish the HTML and JavaScript print ''' ); var tree = new google.visualization.TreeMap(document.getElementById('chart_div')); tree.draw(data, { maxDepth: 2, minColor: 'YellowGreen', midColor: 'LightGoldenRodYellow', maxColor: 'Red', headerHeight: 15, fontColor: 'black', showScale: true}); } </script> </head> <body> <div id="chart_div" style="width: 900px; height: 500px;"></div> </body> </html> '''

Below is some example data you could plug in if you want to get experimenting quickly. The first item contains the labels for each column. I’ve limited the depth of my script’s search for source files to one subdirectory, and my rows need a file name and a parent. In general the list must include one entry for each level in the data hierarchy. That’s followed by two data points, one for size and one for color. One tricky thing to remember is that each level, in my example the root and one subdirectory, also needs an entry in the list.

This following example shows just a few lines of my analysis. Root is the top level of the tree. Screens is a directory. Firmware.cpp is in the top level of the repo, and AboutScreen.cpp is in the Screens directory.

[['File', 'Directory', 'LOC (size)', 'Bugs (color)'], ['root', null, 0, 0], ['Screens', 'root', 0, 0], ['Firmware.cpp', 'root', 1258, 23], ['AboutScreen.cpp', 'Screens', 116, 0]]

There are a few options in the treemap API such as the depth of the tree, and color ranges to use. You can see my choices in the script above. Of course you can add JavaScript to handle clicks on the map with your own code. I think it’s fun to play with this stuff and you may find some other chart types will work as well for you and your data.

Spot the Vulnerability: Loops and Terminating Conditions

by Adam Singleton | Jan 7, 2022 | Developer Blog, Home Display

Spot the Vulnerability: Loops and Terminating Conditions In memory-unsafe languages like C, special care must be taken when copying untrusted data, particularly when copying it to another buffer. In this post, we\'ll spot and mitigate a past vulnerability in Linux\'s...

Accurate Timing

by Adam Singleton | Sep 24, 2021 | Developer Blog, Home Display

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

Exploring Dependent Types in Idris

by Adam Singleton | Aug 27, 2021 | Developer Blog, Home Display

When I'm not coding the "impossible" at Art+Logic, I take a lot of interest in new programming technologies and paradigms; even if they're not yet viable for use in production, there can often be takeaways for improving your everyday code. My current...

I Think I See The Problem

The Tools

An Example

Recent posts

Categories

Spot the Vulnerability: Loops and Terminating Conditions

Accurate Timing

Exploring Dependent Types in Idris

Solutions

About