Recently I was asked to look into an issue related to QEMU’s I/O subsystem performance – specifically, I was looking for differences in performance between glibc’s malloc and other malloc implementations’. After a good deal of benchmarking I was unable to see a clear difference between our malloc (glibc) and others, not because the implementations were similar, but because there was too much noise in the system; the benchmarks were being polluted by other things, not just in QEMU, but elsewhere in the operating system. I really just wanted to see how much time malloc was using, but it was a small signal in a large noisy system.
To get to the results I needed, I had to isolate malloc’s activity so I could measure it more accurately. In this blog, I’ll document how I extracted the malloc workload, and how to isolate that workload from other system activity so it can be measured more effectively.
Continue reading “Practical micro-benchmarking with ‘ltrace’ and ‘sched’”