If an important task is processor limited, one would like to make sure that the task is getting as much processor time as possible and other tasks are not delaying the execution of the important task. The SystemTap example script, cycle_thief.stp
, lists what interrupts and other tasks run on the same processor as the important task. The cycle_thief.stp
script provides the following pieces of information:
- the number of times the monitored task migrated
- a histogram of the duration of time spent running on the processor
- a histogram of the duration of time spent off the processor
- a list of the other processes taking CPU time from the monitored process
- a list of hardware interrupts handled on the same processor as the monitored task
Each time a task migrates to another processor the new processor needs to load its local caches and Translation Lookaside Buffers (TLBs), which adds overhead to the application's execution and can slow the application. Similarly, when the processor runs a different process the cache and TLBs entries for the important process are replaced with entries for that new process. The histograms show whether the monitored task is on the processor for long durations or whether it is off the processor for long duration delaying the execution and increasing the probability the items will need to be reloaded into the cache and TLBs.
The cycle_thief.stp
script lists the PID of tasks that run on the same processor as the monitored process sorted from most frequently to least frequently running. On a multiprocessor machine tasks could be assigned to various processors with the taskset
command to minimize migration and interference between processes or in some case other processes (such as an unneeded daemon) can be safely stopped.
The cycle_thief.stp
script also lists the number of times hardware interrupts occur for the monitored process and provides the minimum, average, and maximum time in microseconds for each type of interrupt. In some cases these interrupts can be assigned to particular processors to minimize interruptions for the monitored process.
Follow the instructions in the SystemTap Beginners Guide to install SystemTap. The cycle_thief.stp
script needs the PID of process to monitor specified via the SystemTap -x
option. Below is an example of cycle_thief.stp
monitoring the Firefox browser process PID 5918. Cntl-C is pressed to stop the data collection and produce the summary:
# /usr/share/doc/systemtap-*/examples/process/cycle_thief.stp -x 5918 ^C task 5918 migrated: 5 task 5918 on processor (us): value |-------------------------------------------------- count 0 | 8 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3226 2 |@@@@@@@@@@@@@@@@ 1093 4 |@@@@@@@@@@@@@ 888 8 |@@@@@@@@@@@@@@@@@ 1131 16 |@@@@@@@@@@@@@ 891 32 |@@@@@@@@@@@ 748 64 |@@@@@@@@@@@ 715 128 |@@@@@@ 427 256 |@@@ 200 512 |@@ 150 1024 |@@ 171 2048 |@@@ 219 4096 |@@@@ 303 8192 |@@@@@@@@@@@@@@@@@@@@@ 1402 16384 | 0 32768 | 0 task 5918 off processor (us) value |-------------------------------------------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3439 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 5009 4 |@@@@@@@@@@@@@@ 1450 8 |@@ 295 16 |@@ 216 32 |@ 189 64 |@ 173 128 |@@@@@ 582 256 |@ 136 512 | 22 1024 | 3 2048 | 5 4096 | 13 8192 | 11 16384 | 10 32768 | 3 65536 | 5 131072 | 4 262144 | 7 524288 | 0 1048576 | 0 other pids taking processor from task 5918 0 10535 5160 870 6005 614 4974 604 3 231 19 119 6012 112 5163 52 39 11 5632 7 705 5 356 3 11 2 17 2 10 2 27 2 551 1 5937 1 501 1 1346 1 2745 1 irq taking processor from task 5918 irq count min(us) avg(us) max(us) 42 191 6 17 37 40 8 3 5 9
This example of cycle_thief.stp
output shows that the process migrated between different processors 5 times. It might be desirable to pin the process to a particular processor to avoid some of the overhead of refilling caches and TLBs. The Red Hat Realtime Reference Guide chapter on Affinity describes how to use the taskset
command and sched_getaffinity()
function to pin a task to a particular processor.
The first histogram, the "on processor" histogram, shows that the longest duration that the process has the processor is between 4096 and 8192 microseconds, approximately 4 and 8 milliseconds. The second histogram, the "off processor" histogram, shows that most delays due to other processes is relatively short. However, there are seven that were between 131072 and 262144 microseconds or 130 and 260 milliseconds. If the application was latency sensitive this would need further investigations to ensure that those delays were not unduly delaying the application.
The list of PIDs give some indications of what other tasks are competing for processor time. To get some idea of what those other processes are doing, use ps
command to get information about them:
$ ps p 5160 6005 4974 3 6013 6163 39 5632 705 356 PID TTY STAT TIME COMMAND 3 ? S 0:00 [ksoftirqd/0] 39 ? S 0:00 [ksoftirqd/6] 356 ? S< 0:00 [kworker/4:1H] 705 ? Ss 0:02 /sbin/rngd -f 4974 ? S 0:00 [kworker/2:1] 5160 ? S 0:00 [kworker/0:1] 5632 ? Sl 0:03 /usr/bin/gnome-shell 6005 ? S 0:00 [kworker/4:1]
On Red Hat Enterprise Linux 7 beta you may see kernel tasks names in "[]
" as in the above ps output. The ksoftirqd
entries are threads handling soft interrupts and the kworker
threads are handing other kernel work. The kernel ftrace
subsystem and trace-cmd
can provide more details on the code that the kworker
threads are running. The SystemTap periodic.stp
example script can be used to determine the frequency that these tasks are running.
The last part of the cycle_thief.stp
output is a list of hardware interrupts that occurred while the task was running. In this case there were two types of interrupts occurred: 42 and 40. cat /proc/interrupts
will provide a description of the interrupts. The irq 42 is related to the i915 graphics device on the machine and took an average of 17 microseconds to service. The irq 40 is the Advanced Host Controller Interface (AHCI) for the disk drive. In some cases it may be desirable to direct particular interrupts with the irqbalance
command to specific processors to tune performance.
If you are new to SystemTap, see my earlier article, Starting with SystemTap.
Last updated: January 10, 2023