Dive deeper in NUMA systems
A common performance related issue we are seeing is how certain instructions are causing bottlenecks. Sometimes it just doesn't make sense. Especially when it involves lots of threads or shared memory on NUMA systems. For quite awhile a bunch of us have been writing tools to help exploit features of the CPU to provide us insight to not only the instruction of the bottleneck but the data address too. See, the instruction is only half the picture. Having the data...