In 2015, an issue was opened against Kubernetes about supporting container migration. The problem description mentioned Checkpoint/Restore In Userspace (CRIU) on Linux as a possible basis for a solution. Around the same time, I started to look into how to integrate CRIU into the container stack.
Note: This article is a preview of my upcoming session at KubeCon + CloudNative NA 2021, happening October 11 to 15. See the end of this article for more about my session.
Checkpoint and restore in the container stack
The basic steps to migrate running containers from one node to another—which could also be called stateful migration—are to checkpoint the container on the source node, transfer the checkpoint image to the destination node, and restore the container on the destination node. This way, the container is migrated without losing its state.
In 2015, however, the container stack was not ready to support checkpoint and restore in the orchestration layer (Kubernetes). The container runtime layer, runc, offered limited support for checkpointing and restoring containers, but that support was not yet available in the higher layers of the container stack.
Over the years, I was involved in bringing checkpoint and restore support to these upper layers of the container stack. Around 2018 I implemented checkpoint and restore support in Podman. Bringing checkpoint and restore support, and thus migration support, to Podman required many changes in runc and CRIU. It was necessary to support different Linux security techniques used in containers, including SELinux, AppArmor, and seccomp, before Podman could migrate a container from one node to another without losing any of its state.
Checkpointing a container out of a pod
Eventually, it was possible to migrate containers with a few simple commands from one node to another. But at this point, it was still not possible to integrate checkpoint and restore into Kubernetes. One big remaining barrier to adding support for container checkpoint and restore in Kubernetes was that, until now, no one had looked into how to combine the concept of pods in Kubernetes with CRIU and the whole container stack.
A container in Linux is usually one or more processes using Linux namespaces to create boundaries between processes in different containers. (See Demystifying namespaces and containers in Linux for an introduction to Linux namespaces.) In Kubernetes, containers run in pods and pods share some of their namespaces with all of the containers in the pod. But only some namespaces are shared. Before being able to checkpoint a container out of a pod and restore it into another pod, it was first necessary to enable pod support in CRIU and the container stack layers below Kubernetes; specifically, to enable checkpointing a container out of a pod and restoring the container into an existing pod. In addition to enabling the sharing of namespaces, we also needed to join existing SELinux contexts upon restore.
Use cases for checkpoint and restore in Kubernetes
Before integrating checkpoint and restore into Kubernetes, we thought about possible use cases and came up with the following:
- Reboot without losing state: Sometimes, it is necessary to reboot a node for important security updates. With the help of checkpoint and restore, a slow starting container can be checkpointed before the reboot. Then, after the reboot, the container can be restored from the checkpoint without losing any state and without long service downtimes.
- Quick startup: Similar to the first use case, one might want a slow-starting container to start faster. For containers that require a long time to initialize, checkpoint and restore can be used to create checkpoints of a container after the long initialization phase. Then the system can quickly spin up additional copies based on the checkpoint, which is already initialized.
- Container migration: Checkpointing a container on one node and restoring it on another node constitutes container migration and would provide what was requested in the ticket from 2015.
- Forensic container checkpointing: This use case checkpoints a container without stopping it and without the container knowing that it was checkpointed. The checkpointed container can be restored in a sandboxed environment for further threat analysis.
One of the challenges we faced when we thought about introducing checkpoint and restore into Kubernetes was how to do it in a minimal way with as little impact as possible on anything else. The forensic container checkpointing use case was a useful but simple one to try out that requirement. After we implemented this use case, it became possible to see how checkpointing can be used in Kubernetes without breaking anything else.
Learn more at KubeCon + CloudNative North America 2021
At KubeCon + CloudNative North America 2021, I will present more details about Kubernetes and checkpoint restore. I will present additional use cases for checkpoint and restore in combination with containers. There will also be a live demo of all the use cases I present. I will give technical details about how CRIU enables checkpointing and restoring of containers, and an overview of how CRIU enables container migration in different container engines. Join my session on October 14 and I will be happy to answer any related questions.Last updated: October 8, 2021