Optimizing Code For Modern Processors (William Cohen)
Programmers use a simple sequential model of how a processor executes steps in a program, but in reality the processor's implementation is much more complex. The processor employs techniques that exploit typical characteristics of the code allowing the processor to execute the code much faster than the simple programmer's model and reduce the cost of some individual instructions by a factor of ten to one-hundred. However, the processor must preserve the behavior that the programmer expects and fallback to the slower methods when the optimization would yield results that differ from the programmer's model. Mechanisms such as caches, pipelines, branch prediction, and threading are commonly used in modern processors to improve performance. In this session, we'll explain the performance implications of these mechanisms, how to identify specific performance issues such as poor caching and branch prediction using the tools available on Linux and some optimization techniques that better match code and hardware capabilities.