I had an interesting question from one of our developers here at Red Hat:
"When I was investigating a performance issue in our project after  switched to oracle's jdk7u40, I found a performance regression in  class sun.net.www.protocl.http.HttpURLConnection.getOutputStream(). This method takes more cpu time than with jdk7u25."
And it does, much more time. In fact, when fixedLengthStreamingMode is enabled HttpURLConnection.getOutputStream() takes ten times as long: about 1.2 milliseconds versus 47 microseconds.
So, I traced through getOutputStream() deep into the class library, but it's hard to tell exactly how long a chunk of Java code takes to execute. I wanted to be able to add log points so that I could measure intervals between points in the code. Ideally, the interface would be something like
logEvent(int n, String s)
where n is some sort of counter, s is recorded in the log, and when the program ended the log would be printed to a file somewhere. I didn't want to modify the test program any more than entirely necessary, just to insert log points into the Java library around whatever I wanted timing.
It is possible to write this in pure Java, but there is a problem: the logging itself takes quite a long time. Creating a log item like this
    static class LogItem {
        public int n;
        public String s;
        public long time;
        LogItem(int n, String s) {
            this.n = n; this.s = s;
            this.time = System.nanoTime();
        }
    }
takes more than 600 nanoseconds, mostly because of the call to System.nanoTime(). This is much too intrusive. What I wanted was a flyweight logger that can be called from anywhere and affects running programs as little as possible.
This is where one of the great advantages of Free and Open Source software comes into play. Freedom 1 in the Free Software Definition is the ability to study how a program works, and change it so it does your computing as you wish. So, how hard would it be to add a very high-precision low-overhead logger to OpenJDK?
Intel Core i7 processors have the RDTSCP instruction, which is a very high-precision timer. It has a few disadvantages, particularly on multi-socket computers where the TSC isn't synchronized, but for measuring very short intervals of time it can't be beat. TSC ticks at some fixed frequency; on my box it's 3.2 GHz. So, to get time in nanoseconds you divide by 3.2.
Using GCC you read the TSC like this:
static __inline__ unsigned long long rdtsc(void)
{
  unsigned hi, lo;
  __asm__ __volatile__ ("rdtscp" : "=a"(lo), "=d"(hi));
  return ( (unsigned long long)lo)|( ((unsigned long long)hi) <<32 );
}
And this is everything I need for a flyweight logger.
This is the log itself. It's a fixed-size array of log records, statically allocated in libjava.so. The idea here is that the actual act of logging won't have to allocate any dynamic memory.
typedef struct {
  unsigned long time;
  int n;
  char str[64];
} log_item;
#define LOG_SIZE 1048576
static log_item a_log[LOG_SIZE];
static unsigned int log_counter = ~0;
I added my logging method to sun.misc.Unsafe, just for convenience:
static void atexit_fn(void);
/*
 * Class:     sun_misc_Unsafe
 * Method:    logEvent
 * Signature: (ILjava/lang/String;)V
 */
JNIEXPORT void JNICALL Java_sun_misc_Unsafe_logEvent
  (JNIEnv *env, jclass cla, jint n, jstring s)
{
  (void)cla;
  if (log_counter == ~0) {
    atexit(atexit_fn);
    log_counter = 0;
  }
  a_log[log_counter].n = n;
  if (s) {
    jsize len = (*env)->GetStringLength(env, s);
    if (len >= 64) len = 63;
    const jchar *carray = (*env)->GetStringCritical(env, s, NULL);
    char *dst = a_log[log_counter].str;
    const jchar *src = carray;
    while (len--)
      *dst++ = *src++;
    *dst = 0;
    (*env)->ReleaseStringCritical(env, s, carray);
  } else {
    a_log[log_counter].str[0] = 0;
  }
  a_log[log_counter].time = rdtsc();
  log_counter++;
  log_counter %= LOG_SIZE;
}
So, the very first time that Unsafe.logEvent() is called, we register an atexit callback function. Then we log the event and its time and (optionally) a string that's passed to the logger. If we overrun our log, we just wrap round. There's no real thread safety: two threads might overwrite the same log entry. This log routine does not do anything that isn't necessary. Also, nothing in the call to Unsafe.logEvent() allocates any memory.
Here's the callback which prints the log when Java exits:
static unsigned long adjust(unsigned long dt) {
  return dt * 10 / 32;   // Convert 3.2GHz ticks to ns
}
static void atexit_fn(void) {
  jlong time0;
  jlong last;
  FILE *outf = fopen("/tmp/deleteme", "w+");
  if (! outf) perror("fopen");
  int i;
  for (i = 0; i < log_counter; i++) {
     log_item *item = &a_log[i];
     if (item->n == 0)
      time0 = item->time;
    fprintf(outf, "%ld (%ld): %d %sn", adjust(item->time - time0),
	    adjust(item->time - last),
	    item->n, item->str);
    last = item->time;
  }
}
So, how long does a log event actually take to execute? We can measure it:
        for (int i = 0; i < 1000000; i++) {
        	sun.misc.Unsafe.logEvent(0, null);
        	sun.misc.Unsafe.logEvent(1, null);
        	sun.misc.Unsafe.logEvent(2, null);
        }
And the log:
0 (27): 0 28 (28): 1 56 (27): 2 0 (28): 0 ...
Oh my goodness that's fast. 28 nanoseconds. 20 times faster than a call to System.nanoTime(). 28 nanoseconds is the time it takes for a photon of light to pass from one end of a London bus to the other.
What happens if we pass a string to Unsafe.logEvent() ?
        for (int i = 0; i < 1000000; i++) {
        	sun.misc.Unsafe.logEvent(20, null);
        	sun.misc.Unsafe.logEvent(25, "One two three four");
        	sun.misc.Unsafe.logEvent(30, null);
0 (21): 0
75 (75): 25 One two three four
103 (27): 30
Hmmm, that's slower: 75 nanoseconds. If you're trying to measure a very small interval that's a problem, but sometimes you need a little bit more information than an integer. It would have been possible to convert this native method call into a HotSpot intrinsic for even lower overhead, but the overhead is already so low that I didn't bother.
With this tool, all that you need to do to measure an interval in the OpenJDK class library (or indeed any Java code) is insert a pair of Unsafe.logEvent() calls around it. There is some small timing effect, but it's really not very much.
The full patch (for OpenJDK 7) is here
Would it have been better if OpenJDK had a lightweight logging facility like this built in? I'm not sure. Maybe, maybe not. It would have to be much more general-purpose, would inevitably have more bells and whistles, and would be more complex. You don't get flyweight anything from generality. Maybe this is one of those cases where a quick hack is just what you want.
This is why Freedom #1 is so vital: without the ability to read the source code for OpenJDK and change it, this task would have been far harder.
Last updated: April 25, 2022