If an important task is processor limited, one would like to make sure that the task is getting as much processor time as possible and other tasks are not delaying the execution of the important task. The SystemTap example script, cycle_thief.stp, lists what interrupts and other tasks run on the same processor as the important task. The cycle_thief.stp script provides the following pieces of information:

  • the number of times the monitored task migrated
  • a histogram of the duration of time spent running on the processor
  • a histogram of the duration of time spent off the processor
  • a list of the other processes taking CPU time from the monitored process
  • a list of hardware interrupts handled on the same processor as the monitored task

Each time a task migrates to another processor the new processor needs to load its local caches and Translation Lookaside Buffers (TLBs), which adds overhead to the application's execution and can slow the application. Similarly, when the processor runs a different process the cache and TLBs entries for the important process are replaced with entries for that new process. The histograms show whether the monitored task is on the processor for long durations or whether it is off the processor for long duration delaying the execution and increasing the probability the items will need to be reloaded into the cache and TLBs.

The cycle_thief.stp script lists the PID of tasks that run on the same processor as the monitored process sorted from most frequently to least frequently running. On a multiprocessor machine tasks could be assigned to various processors with the taskset command to minimize migration and interference between processes or in some case other processes (such as an unneeded daemon) can be safely stopped.

The cycle_thief.stp script also lists the number of times hardware interrupts occur for the monitored process and provides the minimum, average, and maximum time in microseconds for each type of interrupt. In some cases these interrupts can be assigned to particular processors to minimize interruptions for the monitored process.

Follow the instructions in the SystemTap Beginners Guide to install SystemTap. The cycle_thief.stp script needs the PID of process to monitor specified via the SystemTap -x option. Below is an example of cycle_thief.stp monitoring the Firefox browser process PID 5918. Cntl-C is pressed to stop the data collection and produce the summary:

# /usr/share/doc/systemtap-*/examples/process/cycle_thief.stp -x 5918
^C
task 5918 migrated: 5

task 5918 on processor (us):
value |-------------------------------------------------- count
    0 |                                                      8
    1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  3226
    2 |@@@@@@@@@@@@@@@@                                   1093
    4 |@@@@@@@@@@@@@                                       888
    8 |@@@@@@@@@@@@@@@@@                                  1131
   16 |@@@@@@@@@@@@@                                       891
   32 |@@@@@@@@@@@                                         748
   64 |@@@@@@@@@@@                                         715
  128 |@@@@@@                                              427
  256 |@@@                                                 200
  512 |@@                                                  150
 1024 |@@                                                  171
 2048 |@@@                                                 219
 4096 |@@@@                                                303
 8192 |@@@@@@@@@@@@@@@@@@@@@                              1402
16384 |                                                      0
32768 |                                                      0

task 5918 off processor (us)
  value |-------------------------------------------------- count
      0 |                                                      0
      1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                 3439
      2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  5009
      4 |@@@@@@@@@@@@@@                                     1450
      8 |@@                                                  295
     16 |@@                                                  216
     32 |@                                                   189
     64 |@                                                   173
    128 |@@@@@                                               582
    256 |@                                                   136
    512 |                                                     22
   1024 |                                                      3
   2048 |                                                      5
   4096 |                                                     13
   8192 |                                                     11
  16384 |                                                     10
  32768 |                                                      3
  65536 |                                                      5
 131072 |                                                      4
 262144 |                                                      7
 524288 |                                                      0
1048576 |                                                      0

other pids taking processor from task 5918
     0      10535
  5160        870
  6005        614
  4974        604
     3        231
    19        119
  6012        112
  5163         52
    39         11
  5632          7
   705          5
   356          3
    11          2
    17          2
    10          2
    27          2
   551          1
  5937          1
   501          1
  1346          1
  2745          1

irq taking processor from task 5918
   irq      count    min(us)    avg(us)    max(us)
    42        191          6         17         37
    40          8          3          5          9

This example of cycle_thief.stp output shows that the process migrated between different processors 5 times. It might be desirable to pin the process to a particular processor to avoid some of the overhead of refilling caches and TLBs. The Red Hat Realtime Reference Guide chapter on Affinity describes how to use the taskset command and sched_getaffinity() function to pin a task to a particular processor.

The first histogram, the "on processor" histogram, shows that the longest duration that the process has the processor is between 4096 and 8192 microseconds, approximately 4 and 8 milliseconds. The second histogram, the "off processor" histogram, shows that most delays due to other processes is relatively short. However, there are seven that were between 131072 and 262144 microseconds or 130 and 260 milliseconds. If the application was latency sensitive this would need further investigations to ensure that those delays were not unduly delaying the application.

The list of PIDs give some indications of what other tasks are competing for processor time. To get some idea of what those other processes are doing, use ps command to get information about them:

$ ps p  5160 6005 4974 3 6013 6163 39 5632 705 356
  PID TTY      STAT   TIME COMMAND
    3 ?        S      0:00 [ksoftirqd/0]
   39 ?        S      0:00 [ksoftirqd/6]
  356 ?        S<     0:00 [kworker/4:1H]
  705 ?        Ss     0:02 /sbin/rngd -f
 4974 ?        S      0:00 [kworker/2:1]
 5160 ?        S      0:00 [kworker/0:1]
 5632 ?        Sl     0:03 /usr/bin/gnome-shell
 6005 ?        S      0:00 [kworker/4:1]

On Red Hat Enterprise Linux 7 beta you may see kernel tasks names in "[]" as in the above ps output. The ksoftirqd entries are threads handling soft interrupts and the kworker threads are handing other kernel work. The kernel ftrace subsystem and trace-cmd can provide more details on the code that the kworker threads are running. The SystemTap periodic.stp example script can be used to determine the frequency that these tasks are running.

The last part of the cycle_thief.stp output is a list of hardware interrupts that occurred while the task was running. In this case there were two types of interrupts occurred: 42 and 40. cat /proc/interrupts will provide a description of the interrupts. The irq 42 is related to the i915 graphics device on the machine and took an average of 17 microseconds to service. The irq 40 is the Advanced Host Controller Interface (AHCI) for the disk drive. In some cases it may be desirable to direct particular interrupts with the irqbalance command to specific processors to tune performance.

If you are new to SystemTap, see my earlier article, Starting with SystemTap.

Last updated: January 10, 2023