Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Enable etcd backups for OpenShift clusters in hybrid cloud environments

September 26, 2024
Saurabh Kumar Ghoshal
Related topics:
Hybrid cloud
Related products:
Red Hat OpenShift Container Platform

    This article discusses etcd backups for Red Hat OpenShift 4.X clusters in hybrid scenarios. This is a crucial activity for disaster recovery or node failure. etcd backups are responsible for recovering the state of master nodes and the cluster state, as it is the primary datastore of Kubernetes. It is recommended to store it externally as it ensures accessibility for node restoration even if node access or the nodes themselves become unavailable.

    When to back up

    Ideally you should initiate the cluster’s etcd data backup regularly and store it in a secure location outside the OpenShift cluster. After creating a new OpenShift cluster, the first certificate rotation happens after 24 hours of installation; you should not start the etcd backup before this operation as it will contain expired certificates. Additionally, it is recommended to initiate etcd backups during non-peak hours, as an etcd snapshot has a high I/O cost. Also, be sure do your etcd backup before and after any cluster upgrade process.

    How to back up

    In an OpenShift cluster, to back up your etcd database, an automated script is already provided at location /usr/local/bin/cluster-backup.sh at the master node. To access it, you need to start a debug session with OpenShift CLI. 

    oc debug node/<master_node_name>  helps you to log in to master node. Once you run it, it will create a backup at the mentioned folder location. In the following sections, we will explain how to automate this process using a CronJob. This CronJob is run on the OpenShift cluster itself and will back up this file for all the master nodes in a timely matter. Make sure this backup that is created in master node is daily cleaned so that it doesn’t fill the disk space.

    Where to store the backup?

    This backup can be stored in any storage outside the cluster but should be reachable from the cluster. In this article we will explore the scenario of storing the etcd backup on Cloud Object Storage like S3. Similarly, it can be stored in other object stores for other clouds and NFS and other file share available on the clouds.

    Execution 

    The next section details the steps required to store the etcd backup on IBM Cloud Object Storage.

    Prerequisites

    • You have access to cluster as a user with cluster-admin role.
    • You have created an S3 Bucket which is accessible from the cluster.

    We will create the following in OpenShift cluster:

    • Namespace.
    • Service account.
    • Cluster role.
    • Cluster role binding.
    • AWS S3 key.
    • CronJob.

    You can create the namespace from the console or from OpenShift client CLI.

    To schedule the etcd backup as a daily CronJob, it is important to create a dedicated namespace. Also make sure only cluster-admins have access to this namespace. Other team members would not need access to this namespace. See below: 

    oc new-project etcd-bkp  --description “Openshift ETCD Backup” –display-name “ETCD Backup to S3”

    Service account

    We will create a service account to run the etcd backup CronJob with it:

    kind: ServiceAccount
    apiVersion: v1
    metadata: 
      name: cronjob-etcd-bkp-sa
      namespace: etcd-bkp
      labels:
        app: cronjob-etcd-backup
    oc apply -f service_account.yaml

    Cluster role

    A cluster role is required to run the pod with proper privileges. The below YAML is required to create the proper cluster role:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: cronjob-etcd-bkp-cr
    rule:
    - apiGroups: [""]
      resources:
        -  "nodes"
      verbs: ["get","list"]
    - apiGroups: [""]
      resources:
        - "pods"
        - "pods/log"
      verbs: ["get","list","create","delete","watch"]
    oc apply -f cluster_role.yaml

    Cluster role binding

    After the role creation, we need to bind with the service account we just created. Here is the YAML file to create the cluster role binding:

    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name:  cronjob-etcd-bkp-crb
      labels:
        app: cronjob-etcd-backup
    subjects:
      - kind: ServiceAccount
        name: cronjob-etcd-bkp-sa
        namespace: etcd-bkp
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cronjob-etcd-bkp-cr
    oc apply -f cluster_role_binding.yaml

    AWS S3 secret 

    We need to store the AWS access key ID and AWS access key in a secret for the S3 bucket. This secret is referred to in the CronJob to access the S3 bucket. Below is the sample to create the key:

    apiVersion: v1
    kind: Secret
    metadata:
      name: aws-s3-etcd-key
      namespace: etcd-bkp
    type: Opaque
    data: 
      aws_access_key_id: <key_id | base 64>
      aws_secret_access_key: <access_key | base 64>
      region: <bucket_region | base 64>
    oc apply -f s3_secret.yaml

    CronJob

    We can run the CronJob in two ways. 

    We can schedule our CronJob in such a way that the job runs in master node and takes the backup in the node itself and pushes it to S3 bucket. Let’s explore the CronJob provided below.

    In this CronJob, the task runs on master node because of node selector:

    spec:
    	nodeSelector:
    		node-role.kubernetes.io/master: '' 

    It uses the CLI image to initiate the backup:

    image: registry.redhat.io/openshift4/ose-cli

    It then invokes the backup script:

    chroot /host /usr/local/bin/cluster-backup.sh

    It creates the backup at /home/core/backup with date appended to the name:

    chroot /host /usr/local/bin/cluster-backup.sh /home/core/backup/$(date "+%F_%H%M%S")

    It cleans the older backups:

    chroot /host find /home/core/backup -minidepth 1 -type d -mmin +2 -exec rm -rf {} \;

    Finally, it pushes the backup to AWS S3 bucket with AWS CLI image:

    then aws s3 cp /host/home/core/backup/ s3://ocp-etcd-sync --recursive;
    kind: CronJob
    apiVersion: batch/v1
    metadata:
      name: cronjob-etcd-backup
      namespace: etcd-bkp
      labels:
        app.kubernetes.io/name: cronjob-etcd-backup
    spec:
      schedule: "* * * * *"
      concurrencyPolicy: Forbid
      suspend: false
      jobTemplate:
        metadata:
          labels:
            app.kubernetes.io/name: cronjob-etcd-backup
        spec:
          backoffLimit: 0
          template:
            metadata:
              labels:
                app.kubernetes.io/name: cronjob-etcd-backup
            spec:
              nodeSelector:
                node-role.kubernetes.io/master: ''
              restartPolicy: Never
              activeDeadlineSeconds: 500
              serviceAccountName: cronjob-etcd-bkp-sa
              hostPID: true
              hostNetwork: true
              enableServiceLinks: true
              schedulerName: default-scheduler
              terminationGracePeriodSeconds: 30
              securityContext: {}
              containers:
                - name: cronjob-etcd-backup
                  image: registry.redhat.io/openshift4/ose-cli
                  terminationMessagePath: /dev/termination-log
                  command:
                  - /bin/bash
                  - '-c'
                  - >-
                    echo -e '\n\n---\nCreate etcd backup local to master\n' &&
                    chroot /host /usr/local/bin/cluster-backup.sh /home/core/backup/$(date "+%F_%H%M%S") &&
                    echo -e '\n\n---\nCleanup old local etcd backups\n' &&
                    chroot /host find /home/core/backup/ -mindepth 1 -type d -mmin +2 -exec rm -rf {} \;
                  securityContext:
                    privileged: true
                    runAsUser: 0
                    capabilities:
                      add:
                        - SYS_CHROOT
                  imagePullPolicy: Always
                  volumeMounts:
                    - name: host
                      mountPath: /host
                  terminationMessagePolicy: File
                - name: aws-cli
                  image: amazon/aws-cli:latest
                  command:
                  - /bin/bash
                  - '-c'
                  - >-
                    while true; do if [[  $(find /host/home/core/backup/ -type d -cmin -1 ]]; then aws s3 cp /host/home/core/backup/ s3://ocp-etcd-sync --recursive; break; fi; done
                  env:
                  - name: AWS_ACCESS_KEY_ID
                    valueFrom:
                      secretKeyRef:
                        name: aws-s3-etcd-key
                        key: aws_access_key_id
                  - name: AWS_SECRET_ACCESS_KEY
                    valueFrom:
                      secretKeyRef:
                        name: aws-s3-etcd-key
                        key: aws_secret_access_key
                  - name: AWS_DEFAULT_REGION
                    valueFrom:
                      secretKeyRef:
                        name: aws-s3-etcd-key
                        key: region
                  volumeMounts:
                    - name: host
                      mountPath: /host
              volumes:
              - name: host
                hostPath:
                  path: /
                  type: Directory
              dnsPolicy: ClusterFirst
              tolerations:
              - key: node-role.kubernetes.io/master
      successfulJobsHistoryLimit: 5
      failedJobsHistoryLimit: 5

    Another way to initiate the backup is to schedule the CronJob in worker node, but it needs to access the master node to do the backup and push it to S3. Below is an example that goes to every master node and takes backup to master node. This can be moved to S3 or any other file storages like NFS or volumes that are backed up. 

    This backup is scheduled to run every 12 hours. During the backup process, the CronJob will also try to delete the older backups that might not be required any longer to avoid filling up storage. It uses the image registry.redhat.io/openshift4/ose-cli. Instead, you can create your own image using Red Hat Universal Base Image (UBI) as base image and install the oc CLI in the base image. 

    This below job will begin the backup at /home/core/backup/ on master nodes:

    ---
    kind: CronJob
    apiVersion: batch/v1
    metadata:
      name: cronjob-etcd-backup
      namespace: etcd-bkp
      labels:
        app: ocp-etcd-bkp
    spec:
      concurrencyPolicy: Forbid
      schedule: "0 */12 * * *"
      failedJobsHistoryLimit: 5
      successfulJobsHistoryLimit: 5
      jobTemplate:
        metadata:
          labels:
            app: ocp-etcd-bkp
        spec:
          backoffLimit: 0
          template:
            metadata:
              labels:
                app: ocp-etcd-bkp
            spec:
              containers:
                - name: etcd-backup
                  image: "registry.redhat.io/openshift4/ose-cli"
                  command:
                    - "/bin/bash"
                    - "-c"
                    - oc get no -l node-role.kubernetes.io/master --no-headers -o name | xargs -I {} --  oc debug {}  --to-namespace=etcd-bkp -- bash -c 'chroot /host sudo -E /usr/local/bin/cluster-backup.sh /home/core/backup/ && chroot /host sudo -E find /home/core/backup/ -type f -mmin +"1" -delete'
              serviceAccountName: "cronjob-etcd-bkp-sa"
              serviceAccount: "cronjob-etcd-bkp-sa"

    This job differs from the previous one; it runs on worker nodes and uses oc debug to log in to the master node, list all master nodes, and begin backup one by one using the following command:

    - oc get no -1 node-role.kubernetes.io/master --no-headers -o name | xargs -I {} -- oc debug {}

    References

    • OCP Disaster Recovery Part 1 - How to Create Automated ETCD Backup in Openshift 4.x
    • Documentation: Backing up etcd
    • Solution: Automate syncing RHOCP's etcd-db backups to an AWS S3 bucket

    Related Posts

    • Red Hat technologies make open hybrid cloud a reality

    • Using containerization for modern hybrid cloud application development

    • My advice for designing features for the hybrid cloud

    • CloudForms: Manage your IT and Hybrid Cloud through a single platform

    Recent Posts

    • Preventing GPU waste: A guide to JIT checkpointing with Kubeflow Trainer on OpenShift AI

    • How to manage TLS certificates used by OpenShift GitOps operator

    • Configure a split disk on OpenShift Container Platform

    • Red Hat Enterprise Linux 10.2 and 9.8: Top features for developers

    • What GPU kernels mean for your distributed inference

    What’s up next?

    Address cross-cloud identity challenges with SPIFFE/SPIRE on OpenShift by deploying and working with applications in a no-cost OpenShift cluster.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.