Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Checkpoint and restore in Kubernetes

October 7, 2021
Adrian Reber
Related topics:
ContainersLinuxKubernetesSecurity
Related products:
Red Hat OpenShift

Share:

    In 2015, an issue was opened against Kubernetes about supporting container migration. The problem description mentioned Checkpoint/Restore In Userspace (CRIU) on Linux as a possible basis for a solution. Around the same time, I started to look into how to integrate CRIU into the container stack.

    Note:  This article is a preview of my upcoming session at KubeCon + CloudNative NA 2021, happening October 11 to 15.  See the end of this article for more about my session. 

    Checkpoint and restore in the container stack

    The basic steps to migrate running containers from one node to another—which could also be called stateful migration—are to checkpoint the container on the source node, transfer the checkpoint image to the destination node, and restore the container on the destination node. This way, the container is migrated without losing its state.

    In 2015, however, the container stack was not ready to support checkpoint and restore in the orchestration layer (Kubernetes). The container runtime layer, runc, offered limited support for checkpointing and restoring containers, but that support was not yet available in the higher layers of the container stack.

    Over the years, I was involved in bringing checkpoint and restore support to these upper layers of the container stack. Around 2018 I implemented checkpoint and restore support in Podman. Bringing checkpoint and restore support, and thus migration support, to Podman required many changes in runc and CRIU. It was necessary to support different Linux security techniques used in containers, including SELinux, AppArmor, and seccomp, before Podman could migrate a container from one node to another without losing any of its state.

    Checkpointing a container out of a pod

    Eventually, it was possible to migrate containers with a few simple commands from one node to another. But at this point, it was still not possible to integrate checkpoint and restore into Kubernetes. One big remaining barrier to adding support for container checkpoint and restore in Kubernetes was that, until now, no one had looked into how to combine the concept of pods in Kubernetes with CRIU and the whole container stack.

    A container in Linux is usually one or more processes using Linux namespaces to create boundaries between processes in different containers. (See Demystifying namespaces and containers in Linux for an introduction to Linux namespaces.) In Kubernetes, containers run in pods and pods share some of their namespaces with all of the containers in the pod. But only some namespaces are shared. Before being able to checkpoint a container out of a pod and restore it into another pod, it was first necessary to enable pod support in CRIU and the container stack layers below Kubernetes; specifically, to enable checkpointing a container out of a pod and restoring the container into an existing pod. In addition to enabling the sharing of namespaces, we also needed to join existing SELinux contexts upon restore.

    Use cases for checkpoint and restore in Kubernetes

    Before integrating checkpoint and restore into Kubernetes, we thought about possible use cases and came up with the following:

    • Reboot without losing state: Sometimes, it is necessary to reboot a node for important security updates. With the help of checkpoint and restore, a slow starting container can be checkpointed before the reboot. Then, after the reboot, the container can be restored from the checkpoint without losing any state and without long service downtimes.
    • Quick startup: Similar to the first use case, one might want a slow-starting container to start faster. For containers that require a long time to initialize, checkpoint and restore can be used to create checkpoints of a container after the long initialization phase. Then the system can quickly spin up additional copies based on the checkpoint, which is already initialized.
    • Container migration: Checkpointing a container on one node and restoring it on another node constitutes container migration and would provide what was requested in the ticket from 2015.
    • Forensic container checkpointing: This use case checkpoints a container without stopping it and without the container knowing that it was checkpointed. The checkpointed container can be restored in a sandboxed environment for further threat analysis.

    One of the challenges we faced when we thought about introducing checkpoint and restore into Kubernetes was how to do it in a minimal way with as little impact as possible on anything else. The forensic container checkpointing use case was a useful but simple one to try out that requirement. After we implemented this use case, it became possible to see how checkpointing can be used in Kubernetes without breaking anything else.

    Learn more at KubeCon + CloudNative North America 2021

    At KubeCon + CloudNative North America 2021, I will present more details about Kubernetes and checkpoint restore. I will present additional use cases for checkpoint and restore in combination with containers. There will also be a live demo of all the use cases I present. I will give technical details about how CRIU enables checkpointing and restoring of containers, and an overview of how CRIU enables container migration in different container engines. Join my session on October 14 and I will be happy to answer any related questions.

    Last updated: September 20, 2023

    Related Posts

    • Checkpointing Java from outside of Java

    Recent Posts

    • How to enable Ansible Lightspeed intelligent assistant

    • Why some agentic AI developers are moving code from Python to Rust

    • Confidential VMs: The core of confidential containers

    • Benchmarking with GuideLLM in air-gapped OpenShift clusters

    • Run Qwen3-Next on vLLM with Red Hat AI: A step-by-step guide

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue