Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Checkpoint and restore in Kubernetes

October 7, 2021
Adrian Reber
Related topics:
ContainersLinuxKubernetesSecurity
Related products:
Red Hat OpenShift

    In 2015, an issue was opened against Kubernetes about supporting container migration. The problem description mentioned Checkpoint/Restore In Userspace (CRIU) on Linux as a possible basis for a solution. Around the same time, I started to look into how to integrate CRIU into the container stack.

    Note:  This article is a preview of my upcoming session at KubeCon + CloudNative NA 2021, happening October 11 to 15.  See the end of this article for more about my session. 

    Checkpoint and restore in the container stack

    The basic steps to migrate running containers from one node to another—which could also be called stateful migration—are to checkpoint the container on the source node, transfer the checkpoint image to the destination node, and restore the container on the destination node. This way, the container is migrated without losing its state.

    In 2015, however, the container stack was not ready to support checkpoint and restore in the orchestration layer (Kubernetes). The container runtime layer, runc, offered limited support for checkpointing and restoring containers, but that support was not yet available in the higher layers of the container stack.

    Over the years, I was involved in bringing checkpoint and restore support to these upper layers of the container stack. Around 2018 I implemented checkpoint and restore support in Podman. Bringing checkpoint and restore support, and thus migration support, to Podman required many changes in runc and CRIU. It was necessary to support different Linux security techniques used in containers, including SELinux, AppArmor, and seccomp, before Podman could migrate a container from one node to another without losing any of its state.

    Checkpointing a container out of a pod

    Eventually, it was possible to migrate containers with a few simple commands from one node to another. But at this point, it was still not possible to integrate checkpoint and restore into Kubernetes. One big remaining barrier to adding support for container checkpoint and restore in Kubernetes was that, until now, no one had looked into how to combine the concept of pods in Kubernetes with CRIU and the whole container stack.

    A container in Linux is usually one or more processes using Linux namespaces to create boundaries between processes in different containers. (See Demystifying namespaces and containers in Linux for an introduction to Linux namespaces.) In Kubernetes, containers run in pods and pods share some of their namespaces with all of the containers in the pod. But only some namespaces are shared. Before being able to checkpoint a container out of a pod and restore it into another pod, it was first necessary to enable pod support in CRIU and the container stack layers below Kubernetes; specifically, to enable checkpointing a container out of a pod and restoring the container into an existing pod. In addition to enabling the sharing of namespaces, we also needed to join existing SELinux contexts upon restore.

    Use cases for checkpoint and restore in Kubernetes

    Before integrating checkpoint and restore into Kubernetes, we thought about possible use cases and came up with the following:

    • Reboot without losing state: Sometimes, it is necessary to reboot a node for important security updates. With the help of checkpoint and restore, a slow starting container can be checkpointed before the reboot. Then, after the reboot, the container can be restored from the checkpoint without losing any state and without long service downtimes.
    • Quick startup: Similar to the first use case, one might want a slow-starting container to start faster. For containers that require a long time to initialize, checkpoint and restore can be used to create checkpoints of a container after the long initialization phase. Then the system can quickly spin up additional copies based on the checkpoint, which is already initialized.
    • Container migration: Checkpointing a container on one node and restoring it on another node constitutes container migration and would provide what was requested in the ticket from 2015.
    • Forensic container checkpointing: This use case checkpoints a container without stopping it and without the container knowing that it was checkpointed. The checkpointed container can be restored in a sandboxed environment for further threat analysis.

    One of the challenges we faced when we thought about introducing checkpoint and restore into Kubernetes was how to do it in a minimal way with as little impact as possible on anything else. The forensic container checkpointing use case was a useful but simple one to try out that requirement. After we implemented this use case, it became possible to see how checkpointing can be used in Kubernetes without breaking anything else.

    Learn more at KubeCon + CloudNative North America 2021

    At KubeCon + CloudNative North America 2021, I will present more details about Kubernetes and checkpoint restore. I will present additional use cases for checkpoint and restore in combination with containers. There will also be a live demo of all the use cases I present. I will give technical details about how CRIU enables checkpointing and restoring of containers, and an overview of how CRIU enables container migration in different container engines. Join my session on October 14 and I will be happy to answer any related questions.

    Last updated: September 20, 2023

    Related Posts

    • Checkpointing Java from outside of Java

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.