One common idiom in performance monitoring is how long did it take for a program to do something. For example you may want to know the time taken for database queries in PostgreSQL or just-in-time translations in a Java Virtual Machine. SystemTap and user-space markers in Linux packages make it much easier to determine the duration of those operations.
The user-space markers compiled into Linux packages mark key points in the code where particular actions occur. The user-space markers also provide arguments that provide additional information about the action. For example, the markers and the available arguments in PostgreSQL can be listed using using the SystemTap command:
$ stap -L 'process("postgres").mark("*")'
The two user-space markers related to the start and completion of a query are:
process("postgres").mark("query__start") $arg1:long
process("postgres").mark("query__done") $arg1:long
The $arg1 is a pointer to the string describing the query. There could be multiple queries being processed at the same time and $arg1 is useful in matching up the query__start
and query__done
events.
The postgresql-query.stp
below is a short SystemTap script that produces statistics on the minimum, average, and maximum time for database queries for each Thread ID. The query__start
probe records the time in the query_time
global array using the Thread ID and pointer $arg1
as keys. When the query__done
probe executes it will search the query_time
global array for query start time. If a time is found, the elapsed time is computed and that sample is added to the query_summary statistics for the Thread ID. When the script exits the end
probe executes and prints out the number of queries, the minimum time, average time, and maximum time for each thread ID.
global query_time, query_summary
probe process("postgres").mark("query__start") {
query_time[tid(), $arg1] = gettimeofday_us();
}
probe process("postgres").mark("query__done") {
p = tid()
t = query_time[p, $arg1]; delete query_time[p, $arg1]
if (t) {
query_summary[p] <<< (gettimeofday_us() - t);
}
}
probe end {
printf("ntid count min(us) avg(us) max(us)n");
foreach (p in query_summary) {
printf("%d %d %d %d %dn", p, @count(query_summary[p]),
@min(query_summary[p]), @avg(query_summary[p]), @max(query_summary[p]));
}
}
Assuming that PostgreSQL has already been setup to run pgbench, the script is started in one terminal window with:
$ stap postgresql-query.stp
In another terminal window pgbench is run:
# su postgres -c "pgbench -T 10"
When the pgbench
completes the SystemTap script in the previous terminal widow is stopped with a ctl-C yielding the following output:
tid count min(us) avg(us) max(us)
4205 4 699 7049 15276
4206 128072 8 55 240476
It shows there were two threads with those events, thread 4205 and 4206. The bulk of the queries were done in thread 4206 with an average time of 55 microseconds and a maximum of 240,476 microseconds.
Additional information about the PostgreSQL user-space markers is available at http://www.postgresql.org/docs/8.4/static/dynamic-trace.html .
The IcedTea JVM also has similar user-space markers in the hotspot just-in-time translator and SystemTap tapsets which provide additional variables. With the java-1.6.0-openjdk-devel-1.6.0.0 RPM installed to provide the tapsets the available probes can be listed out with:
$ stap -L "hotspot.*,hotspot.*.*"
The postgresql-query.stp
could be adapted to provide information on how many just-in-time translations are performed and how long they take on average. The two probes of interest from the "stap -L
" above are:
hotspot.method_compile_begin name:string compiler:string class:string method:string
sig:string probestr:string $arg1:long $arg2:long $arg3:long $arg4:long
$arg5:long $arg6:long $arg7:long $arg8:long
hotspot.method_compile_end name:string compiler:string class:string method:string
sig:string probestr:string $arg1:long $arg2:long $arg3:long $arg4:long $arg5:long
$arg6:long $arg7:long $arg8:long $arg9:long
We end up with a similar script, icedtea-jit.stp
, below:
global jit_time, jit_summary
probe hotspot.method_compile_begin {
jit_time[tid(), class, method] = gettimeofday_us();
}
probe hotspot.method_compile_end {
p = tid()
t = jit_time[p, class, method]; delete jit_time[p, class, method]
if (t) {
jit_summary[p] <<< (gettimeofday_us() - t);
}
}
probe end {
printf("ntid count min(us) avg(us) max(us)n");
foreach (p in jit_summary) {
printf("%d %d %d %d %dn", p, @count(jit_summary[p]),
@min(jit_summary[p]), @avg(jit_summary[p]), @max(jit_summary[p]));
}
}
In one terminal window start the icedtea-jit.stp
with:
$ stap icedtea-jit.stp
In another window start up a Java application such as Eclipse:
$ eclipse
After the Eclipse Java application is exited with icedtea-jit.stp
with a ctrl-C. This results in something like the following output:
tid count min(us) avg(us) max(us)
4736 501 123 18601 901362
4735 481 131 21573 2097815
This shows that there were two threads doing just-in-time translation when eclipse was run and each translated approximately 500 methods. The minimum times were fairly short around 120 to 130 microseconds, but the average times were much larger on the order of 20 milliseconds. It would be nice to see what the distributions of times were. The @hist_log()
operator can be used in the end
probe to print histograms of the times spent doing just-in-time translation. The probe handlers for the start and end of just-in-time translation remain the same and the end
probe is changed to the following:
probe end {
foreach (p in jit_summary) {
printf("tid = %dn", p)
print(@hist_log(jit_summary[p]))
}
}
Doing the same experiment the new script now better shows the distributions of time:
tid = 5975 value |-------------------------------------------------- count 32 | 0 64 | 0 128 |@@@@@@@@@@@@@@@ 31 256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 60 512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 60 1024 |@@@@@@@@@@@@@@@@@@@@@@@@ 48 2048 |@@@@@@@@@@@@@@@@ 33 4096 |@@@@@@@@@@@@@@@@@@@@@@@@ 48 8192 |@@@@@@@@@@@@@@@@@@ 36 16384 |@@@@@@@@@@@@@@@@@@@ 39 32768 |@@@@@@@@@ 18 65536 |@@@@@ 11 131072 |@@ 5 262144 |@@ 5 524288 | 1 1048576 | 0 2097152 | 1 4194304 | 0 8388608 | 0 tid = 5976 value |-------------------------------------------------- count 16 | 0 32 | 0 64 | 1 128 |@@@@@@@@@@@@@@@@@ 34 256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 69 512 |@@@@@@@@@@@@@@@@@@@@@@@@@ 51 1024 |@@@@@@@@@@@@@@@@@@@@@ 43 2048 |@@@@@@@@@@@@@@@@@@@@@@@@@ 51 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 74 8192 |@@@@@@@@@@@@@@@@@@@@@@@@ 48 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@ 50 32768 |@@@@@@@@@@@@@@@ 31 65536 |@@@@ 8 131072 |@@@ 6 262144 |@ 2 524288 | 1 1048576 | 0 2097152 | 0
The plots show most of the translations take less than 65536 micro seconds, but there are a few slow translations that are approaching a second of wall clock time. One could further modify the probe handlers to show precisely which classes and methods those slow translations were in.
Last updated: January 9, 2023