William Cohen

William Cohen has been a developer of performance tools at Red Hat for over a decade and has worked on a number of the performance tools in Red Hat Enterprise Linux and Fedora such as OProfile, PAPI, SystemTap, and Dyninst.

William Cohen's contributions

Article Thumbnail
Article

How to avoid wasting megabytes of memory a few bytes at a time

William Cohen

Maybe you have so much memory in your computer that you never have to worry about it --- then again, maybe you find that some C or C++ application is using more memory than expected. This could be preventing you from running as many containers on a single system as you expected, it could be causing performance bottlenecks, and it could even be forcing you to pay for more memory in your servers. You do some quick "back of the...

Article Thumbnail
Article

Instruction-level Multithreading to improve processor utilization

William Cohen

No one wants the hardware in their computer sitting idle - we all want to get as much useful work out of our hardware as possible. Mechanisms such as cache and branch prediction have been incorporated into processors to minimize the amount of processor idle time caused by memory accesses and changes in program flow; however, these mechanism are not perfect. There are still times that the processor could be idle waiting for data or computational results to become available...

Article Thumbnail
Article

"Don't cross the streams": Thread safety and memory accesses at the speed of light

William Cohen

The classic 1984 movie Ghostbusters offered an important safety tip for all of us: " Don't cross the streams. " - "Why not?" - "I t would be bad. " - " I’m fuzzy on the whole good/bad thing. What do you mean, 'bad'? " - "Try to imagine all life as you know it stopping instantaneously and every molecule in your body exploding at the speed of light." - "Right. That’s bad. Okay. All right. Important safety tip. Thanks..."...

Article Thumbnail
Article

Superscalar Execution

William Cohen

In the traditional processor pipeline model under ideal circumstances one new instruction enters the processor's and one instruction completes execution each cycle. Thus, for the best case the processor can have an average execution rate of one clock per instruction. A superscalar processor allows multiple unrelated instructions to start on the same clock cycle on separate hardware units or pipelines. Under ideal conditions a superscalar processors could have an average clocks per instruction (CPI) be less one, meaning your 2GHz...

Article Thumbnail
Article

Quickly determine which instruction comes next with Branch Prediction

William Cohen

A pipelined processor requires a steady stream of instructions to be fed into the pipeline. Any delay in feeding instructions into the pipeline will hurt performance. For a sequence of instructions without branches it is relatively easy to determine the next instruction to feed into the pipeline, particularly for processors with fixed sized instructions. Variable-sized instructions might complicate finding the start of each instruction, but it is still a contiguous, linear stream of bytes from memory. Keeping the processor pipeline...

Article Thumbnail
Article

Assembly Line for Computations

William Cohen

The simple programmer's model of a processor executing machine language instructions is a loop of the following steps with each step finished before moving on the the next step: Fetch instruction Decode instruction and fetch register operands Execute arithmetic computation Possible memory access (read or write) Writeback results to register As mentioned in the introduction blog article even if the processor can get each step down to a single cycle that would would be 2.5ns (5*0.5ns) for a 2GHz (2x10^9...

Red Hat logo
Article

Reducing Memory Access Times with Caches

William Cohen

The simple programmer's model of processor executing machine language instruction is a loop of the following steps each step finished before moving on the the next step: Fetch instruction Decode instruction and fetch register operands Execute arithmetic computation Possible memory access (read or write) Writeback results to register At a minimum it takes one processor clock cycle to do each step. However, for steps 1 and 4 accessing main memory may take much longer than one cycle. Modern processors typically...

Article Thumbnail
Article

Programmer's Model of a Processor Executing Instructions Versus Reality

William Cohen

Everything on a computer system eventually ends up being run as a sequence of machine instructions. People want to keep things simple and understandable even if that is not really the way that things work. The simple programmer's model of a Reduced Instruction Set Computer (RISC) processor executing those machine language instruction is a loop of the following steps each step finished before moving on the the next step: Fetch instruction Decode instruction and fetch register operands Execute arithmetic computation...