Red Hat Developer

The Linux perf tool was originally written to allow access to the performance monitoring hardware that counts hardware events, such as instructions executed, processor cycles, and cache misses. However, it can also be used to count software events, which can be useful in gauging how frequently some part of the system software is executed.

Recently, someone at Red Hat asked whether there was a way to get a count of system calls being executed on the system. The kernel has a predefined software trace point, raw_syscalls:sys_enter, which collects that exact information. It counts each time a system call is made. To use the trace point events, the perf command needs to be run as root.

The following code will give system-wide count (-a option) of system calls (-e raw_syscalls:sys_enter) every second (-I 1000):

# perf stat -a -e raw_syscalls:sys_enter -I 1000
#           time             counts unit events
     1.000640941              1,250      raw_syscalls:sys_enter                                      
     2.001183785              1,901      raw_syscalls:sys_enter                                      
     3.001601593              1,922      raw_syscalls:sys_enter   

The raw_syscalls:sys_enter trace point is just one predefined trace point event in the kernel. To list the other 1000+ predefined trace points events, run the following as root:

# perf list tracepoint

List of pre-defined events (to be used in -e):

  block:block_bio_backmerge                          [Tracepoint event]
  block:block_bio_bounce                             [Tracepoint event]
  block:block_bio_complete                           [Tracepoint event]
  block:block_bio_frontmerge                         [Tracepoint event]
  block:block_bio_queue                              [Tracepoint event]
  block:block_bio_remap                              [Tracepoint event]
  block:block_dirty_buffer                           [Tracepoint event]
  block:block_getrq                                  [Tracepoint event]
  block:block_plug                                   [Tracepoint event]
  ...

You may want to have a counter for some arbitrary function in the kernel that does not yet have a trace point. No problem. You can define your own probe points and then use them in the perf stat command to monitor functions that implement expensive operations. For example, clearing a 2MB huge page has latency that is approximately 500 times longer than clearing a traditional 4KB page. These latencies can be noticeable, and you might want to know when a significant number of these delays occur.

The following sets up the probe point in the clear_huge_page function accessible to perf:

# perf probe --add clear_huge_page
Added new event:
  probe:clear_huge_page (on clear_huge_page)

You can now use it in all perf tools, such as:

	perf record -e probe:clear_huge_page -aR sleep 1

The following provides the count for every 10 seconds (10,000 milliseconds):

# perf stat -a -e probe:clear_huge_page -I 10000
#           time             counts unit events
    10.000241215                 73      probe:clear_huge_page                                       
    20.001129381                  4      probe:clear_huge_page                                       
    30.001567364                  3      probe:clear_huge_page                                       
    40.002202895                  2      probe:clear_huge_page                                       
    50.003554968                  1      probe:clear_huge_page                                       
    50.316752807                  0      probe:clear_huge_page
    ...

When you no longer need the probe point for the clear_huge_page function, it can be removed as shown below.

# perf probe --del=probe:clear_huge_page
Removed event: probe:clear_huge_page

The perf probe points can also be placed user-space executables. You may need to compile the code with debuginfo enabled (GCC's -g option) or to install the debuginfo RPMs to allow perf to find the location of the functions. To place a probe on the malloc function in the glibc library, the executable needs to be specified with the --exec option.

# perf probe --exec=/lib64/libc-2.17.so --add malloc
Added new event:
  probe_libc:malloc    (on malloc in /usr/lib64/libc-2.17.so)

You can now use it in all perf tools, such as:

	perf record -e probe_libc:malloc -aR sleep 1

Using probe_libc:malloc, you can get a count of the number of malloc calls occurring every 10 seconds. Below is the output from a machine that is initially sitting idle for the first 20 seconds. After 20 seconds, a parallel kernel build is started, and the number of times that malloc is called increases dramatically.

# perf stat -a -e probe_libc:malloc -I 10000
#           time             counts unit events
    10.000900150                  2      probe_libc:malloc                                           
    20.001803180                  0      probe_libc:malloc                                           
    30.002286255          1,829,385      probe_libc:malloc                                           
    40.002442647         12,553,306      probe_libc:malloc                                           
    50.002578104         15,579,692      probe_libc:malloc
    ...

Once you're done with the user-space probe, it can be deleted:

# perf probe --exec=/lib64/libc-2.17.so --del malloc
Removed event: probe_libc:malloc

Using perf stat with the software probe points can help you answer the question of how frequently some code is being executed. For more information about setting up software probe points, take a look at the perf-probe man page.

Last updated: October 12, 2022