Red Hat Enterprise Linux 7 toolchain a major performance boost for C/C++ developers

Now that Red Hat Enterprise Linux 7 is publicly available, we thought RHEL application developers would be interested in seeing how the new C/C++ toolchain compares to the equivalent in Red Hat Enterprise Linux 6 in terms of raw performance. The numbers are pretty surprising so stay tuned. But first a little introduction to set the scene.

Continue reading “Red Hat Enterprise Linux 7 toolchain a major performance boost for C/C++ developers”


Profiling Ruby Programs

The Ruby Interpreter includes a profiling tool which is invoked with the -rprofile option on the command line. Below is an example running the Ruby Fibonacci program (fib.rb) included in Ruby documentation samples. The list of functions is sorted from most to least time spent exclusively in the function (self seconds). The first column provides the percentage of self seconds for each function. The cumulative seconds indicates the amount of time spent in that function and the functions it calls directly and indirectly. The calls, self ms/call, and total ms/call provide some indication whether the function is called frequently and the average cost of each call to a function.

Continue reading “Profiling Ruby Programs”


Profiling Python Programs

For RHEL6 and newer distributions tools are available to profile Python code and to generate dynamic call graphs of a program’s execution. Flat profiles can be obtained with the cProfile module and dynamic callgraphs can be obtained with pycallgraph.

The cProfile Python module records information about each of the python methods run. For older versions of Python that do not include the cProfile module you can use the higher overhead profile module. Profiling is fairly simple with the cProfile module.

Continue reading “Profiling Python Programs”


Probing Java Methods with SystemTap

Today we’ll be looking at systemtap’s latest native java probing capabilities.
These go beyond systemtap’s existing hotspot-based probe points to actual entry,
exit, and line number specific to the relevant java method.  This allows for pinpoint probing of a java application, without the need to place probes on the underlying JVM itself.

How to install (if running RHEL 7 Beta)

# yum install systemtap systemtap-runtime-java

Basic Usage

How do I use systemtap to probe a java method?

Below we have a simple threaded java program, which waits for our input on the command line.  Given the input ‘int’ or ‘long’, the program will print out a predetermined
variable (that we would like to know the value).

Continue reading “Probing Java Methods with SystemTap”


Performance Regression Analysis with Performance Co-Pilot [video]

In an earlier post we looked into using the Performance Co-Pilot toolkit to explore performance characteristics of complex systems.  While surprisingly rewarding, and often unexpectedly insightful, this kind of analysis can be rightly criticized for being “hit and miss”.  When a system has many thousands of metric values it is not feasible to manually explore the entire metric search space in a short amount of time.  Or the problem may be less obvious than the example shown – perhaps we are looking at a slow degradation over time.

There are other tools that we can use to help us quickly reduce the search space and find interesting nuggets.  To illustrate, here’s a second example from our favorite ACME Co. production system. 

Continue reading “Performance Regression Analysis with Performance Co-Pilot “


Exploratory Performance Analysis with Performance Co-Pilot [video]

Investigating performance in a complex system is a fascinating undertaking.  When that system spans multiple, closely-cooperating machines and has open-ended input sources (shared storage, or faces the Internet, etc) then the degree of difficulty of such investigations ratchets up quickly.  There are often many confounding factors, with many things going on all at the same time.

The observable behaviour of the system as a whole can be frequently changing even while at a micro level things may appear the same.  Or vice-versa – the system may appear healthy, average and 95th percentile response times are in excellent shape, yet a small subset of tasks are taking an unusually large amount of time to complete, just today perhaps.  Fascinating stuff!

Let’s first consider endearing characteristics of the performance tools we’d want to have at our disposal for exploring performance in this environment. 

Continue reading “Exploratory Performance Analysis with Performance Co-Pilot “


NUMA – Verifying it’s not hurting your application performance [video]

As I mentioned here, Joe Mario and I delivered this session at Red Hat’s Developer Exchange session in Boston.  There were a lot of great questions and we hope you’ll find this video-recorded session useful.


Now that you followed all the steps to make your application NUMA-aware, how do you know if you got it right, or if you shifted your performance problem elsewhere?

In this session, Don and Joe will:

  • discuss initial high level steps to verify correct memory and cpu-process placement, including:
    • showing how performance can easily suffer with incorrect placement.
    • describing available options to correct placement.
  • discuss the open source tools, both available now and in development, which use the hardware’s performance counters to more accurately pinpoint:
    • where your program is making costly remote NUMA memory accesses,
    • identifying if and where other programs are inflicting NUMA-related performance penalties on your program,
    • how much those remote accesses are hurting your performance.
  • discuss various approaches for resolving these low-level issues.

Continue reading “NUMA – Verifying it’s not hurting your application performance “