11 May 2016

So I got an assignment in my Computer Architecture class: Write a method in C to perform a specific matrix operation, causing as few cache misses as possible. The first question I had was of course, “how do you measure cache misses?”

Memory Tracing with Valgrind

It turns out that we won’t be measuring cache misses on any kind of real cache — instead, we’re running Valgrind’s memory access tracer (valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes /path/to/binary). The tracer produces a file which has each memory address the program accesses, in order, as well as keeping track of whether it was a read, write, modify (i.e. read followed by write), or instruction load. The trace file looks like this:

I 0400d7d4,8
 M 0421c7f0,4
 L 04f6b868,8
 S 7ff0005c8,9

Each line starts with a symbol denoting the kind of access (Instruction load, Modify, Load, Store), followed by an address, and the amount of memory accessed. This trace is then fed to a cache simulator, which simulates a simplistic cache and tells us how many hypothetical cache misses we had.

Grading Process

To submit our code for grading (or to grade locally), we write code that is then compiled an executable binary that the main grader calls; the binary does not consist wholly of student code, and there is some boilerplate to make grading easier. This is very, very important, as you’ll see in a moment. The main grader program calls Valgrind’s tracer on the secondary binary (with the student code in it), and the secondary binary does the following:

  1. Check to see if the student’s method produces the correct output.
  2. Write to a marker memory location.
  3. Call the student’s method again.
  4. Write to a different marker location.
  5. Pass back to the main grader the correctness result, via environment variables.

The main grader then looks for the beginning and ending marker locations in the memory trace, and feeds just that segment to the cache simulator, using the rsults of that for grading.

Being Naughty With My Grader

So let’s look at the source for the secondary binary, shall we? In the header file that student code #includes for some convinience methods:

/* Markers used to bound trace regions of interest */

/*Later on...*/
for (i=0; i < func_counter; i++) {
		(*func_list[i].func_ptr)(M, N, A, B);
		MARKER_END = 34;

Quick, what’s wrong with this code, security-wise?

The markers aren’t static! If I declare some volatile char’s with the right names in my code, they will point to the same symbols as the markers, so if I write to MARKER_END on the first line of my turned-in method, it looks like I had almost zero cache misses.

And that’s how I created the ‘magic’ algorithm that achieved a record-setting 2 cache misses over an entire 64×64 array.