Featured image for checkpointing Java

When OpenJDK's Java virtual machine (JVM) runs a Java application, it loads a dozen or so classes before it starts the main class. It runs a method several hundred times before it invokes the optimizing compiler on that method. This preparation is a critical component of Java's "write once, run anywhere" power, but it comes at the cost of long startup times.

We've been working on a new approach that allows you to load your classes, warm up your just-in-time (JIT) compiler, and then checkpoint your application. Later, you can restore the application to get it running quickly. With these changes, we have seen applications that took seconds to start come up warm in milliseconds.

In this article, you'll learn how to checkpoint and restore a running Java program from the Linux command line. In another article soon, I will introduce a Java Native Interface (JNI) library that lets you checkpoint and restore a Java program from inside of your Java code.

Using checkpoints in your Java code

The JNI Checkpoint Restore library is based on Linux Checkpoint/Restore in Userspace (CRIU), which we'll use for the examples in this article.  CRIU can save you startup time, but it offers even more possibilities.

If you have a program that runs for a long time, you can regularly checkpoint it. Then, if you get a failure, you can restart the application from the last checkpoint. If the failure is due to a bug, then you can quickly reproduce it. If the failure was caused by an external factor, you could continue from where you left off without losing any work.

As another example, say that you want to take a heap dump at several points in the program, but stopping to walk the heap perturbs the execution. Inserting checkpoints lets you run the program to completion and then go back and restart to do a heap dump. That way, you can see the memory layout at the point in time that interests you, but with an order of program execution that is very similar to the original program.

Does that sound good? Let's go through an example.

Checkpointing from outside of Java

In this example, you'll learn how to checkpoint and restore a running Java program from the command line. To start,  let's say that we are running a Java program called Scooby.

From terminal one, enter:

% setsid java -XX:-UsePerfData -XX:+UseSerialGC Scooby

From another directory in another terminal, enter:

% sudo criu dump -t <pid> --shell-job -o dump.log

You can now do a ps and see that your Java program is no longer running. You can look at the directory and view a number of image files. You can also look at the dump.log to see everything that CRIU did to checkpoint your code.

Now, from the directory where you dumped the image, do the following:

% sudo criu restore --shell-job -d -vvv -o restore.log

You should see your Java program running again. You can check the restore.log to see what restoring did. You will notice that, by default, CRIU restores the JVM to the same process ID (PID). If you want to restore the same image multiple times, you can use virtual PIDs:

% sudo unshare -p -m -f bash
# mount -t proc none /proc/
# criu restore --shell-job

In another window but the same directory, you could do:

% sudo unshare -p -m -f bash
# mount -t proc none /proc/
# criu restore  --shell-job

Conclusion

Checkpointing has a few issues. For now, you need to turn off perf and parallel garbage collection when using checkpoints. If you have a /var/lib/sss/pipes/nss file, you will have to remove it. You will also need root access to run a restore operation because you need to be able to choose a specific PID. The CRIU team is currently working on this issue.

Stay tuned for my next article, where I'll show you how to use the JNI Checkpoint Restore library to checkpoint Java from inside of Java.

Last updated: February 11, 2024