Programmer’s Model of a Processor Executing Instructions Versus Reality

Everything on a computer system eventually ends up being run as a sequence of machine instructions. People want to keep things simple and understandable even if that is not really the way that things work. The simple programmer’s model of a Reduced Instruction Set Computer (RISC) processor executing those machine language instruction is a loop of the following steps each step finished before moving on the the next step:

  1. Fetch instruction
  2. Decode instruction and fetch register operands
  3. Execute arithmetic computation
  4. Possible memory access (read or write)
  5. Writeback results to register

The steps above are simple. However, even the simplest step on the processor takes a minimum of one clock cycle and some of the above steps may take multiple clock cycles. On a machine that has a 2GHz (2×10^9 cycles per second) clock each clock cycle is 0.5 nanoseconds. Thus, it would take 2.5 nanoseconds to execute one instruction, a maximum of only 400 million instructions per second. Like the astronomer Carl Sagan you want to say your processor is excuting “billions and billions” of instructions per second, not mere millions.

Originally, processors designers had a very limited budget for transistors, so the processors pretty much implemented the above steps as simply as possible. However, as designers were allow to use more transistors they realized that they had some flexibility in how the processors executed the instructions. They only needed to make it appear that the processor followed those steps. Behind the scene the processor could take all sort of crazy optimizations and shortcuts, so long as the results looked the same as the simple model to the programmer.

Ideally, the optimizations and shortcuts taken by the processor are transparent to the application program. Most of the time that is the case; the program just runs faster on that processor due to those hardware enhancements. However, there are cases where the application developer can see performance is poor. This series of blog entries will talk about the various performance optimizations that processors implement, how they work, the pitfalls, ways to identify whether there is a problem, and how the developer might resolve it.

The next blog entry will discuss how processor speed up memory accesses.


Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.

 

Share

  1. Looking forward to this series! I think the new C11 memory model will really help users write correctly threaded C code that works on modern optimizing compilers and hardware. For a long time in glibc we’ve had difficulty maintaining both x86_64 and ppc64, two architectures which at the hardware level couldn’t be more different when it comes to out of order execution and atomicity. The C11 memory model has allowed us to unify a lot of what we do under the hood and prepared people to talk about concurrency in constructive and helpful ways.

Leave a Reply