Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Enable etcd backups for OpenShift clusters in hybrid cloud environments

September 26, 2024
Saurabh Kumar Ghoshal
Related topics:
Hybrid Cloud
Related products:
Red Hat OpenShift Container Platform

Share:

    This article discusses etcd backups for Red Hat OpenShift 4.X clusters in hybrid scenarios. This is a crucial activity for disaster recovery or node failure. etcd backups are responsible for recovering the state of master nodes and the cluster state, as it is the primary datastore of Kubernetes. It is recommended to store it externally as it ensures accessibility for node restoration even if node access or the nodes themselves become unavailable.

    When to back up

    Ideally you should initiate the cluster’s etcd data backup regularly and store it in a secure location outside the OpenShift cluster. After creating a new OpenShift cluster, the first certificate rotation happens after 24 hours of installation; you should not start the etcd backup before this operation as it will contain expired certificates. Additionally, it is recommended to initiate etcd backups during non-peak hours, as an etcd snapshot has a high I/O cost. Also, be sure do your etcd backup before and after any cluster upgrade process.

    How to back up

    In an OpenShift cluster, to back up your etcd database, an automated script is already provided at location /usr/local/bin/cluster-backup.sh at the master node. To access it, you need to start a debug session with OpenShift CLI. 

    oc debug node/<master_node_name>  helps you to log in to master node. Once you run it, it will create a backup at the mentioned folder location. In the following sections, we will explain how to automate this process using a CronJob. This CronJob is run on the OpenShift cluster itself and will back up this file for all the master nodes in a timely matter. Make sure this backup that is created in master node is daily cleaned so that it doesn’t fill the disk space.

    Where to store the backup?

    This backup can be stored in any storage outside the cluster but should be reachable from the cluster. In this article we will explore the scenario of storing the etcd backup on Cloud Object Storage like S3. Similarly, it can be stored in other object stores for other clouds and NFS and other file share available on the clouds.

    Execution 

    The next section details the steps required to store the etcd backup on IBM Cloud Object Storage.

    Prerequisites

    • You have access to cluster as a user with cluster-admin role.
    • You have created an S3 Bucket which is accessible from the cluster.

    We will create the following in OpenShift cluster:

    • Namespace.
    • Service account.
    • Cluster role.
    • Cluster role binding.
    • AWS S3 key.
    • CronJob.

    You can create the namespace from the console or from OpenShift client CLI.

    To schedule the etcd backup as a daily CronJob, it is important to create a dedicated namespace. Also make sure only cluster-admins have access to this namespace. Other team members would not need access to this namespace. See below: 

    oc new-project etcd-bkp  --description “Openshift ETCD Backup” –display-name “ETCD Backup to S3”

    Service account

    We will create a service account to run the etcd backup CronJob with it:

    kind: ServiceAccount
    apiVersion: v1
    metadata: 
      name: cronjob-etcd-bkp-sa
      namespace: etcd-bkp
      labels:
        app: cronjob-etcd-backup
    oc apply -f service_account.yaml

    Cluster role

    A cluster role is required to run the pod with proper privileges. The below YAML is required to create the proper cluster role:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: cronjob-etcd-bkp-cr
    rule:
    - apiGroups: [""]
      resources:
        -  "nodes"
      verbs: ["get","list"]
    - apiGroups: [""]
      resources:
        - "pods"
        - "pods/log"
      verbs: ["get","list","create","delete","watch"]
    oc apply -f cluster_role.yaml

    Cluster role binding

    After the role creation, we need to bind with the service account we just created. Here is the YAML file to create the cluster role binding:

    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name:  cronjob-etcd-bkp-crb
      labels:
        app: cronjob-etcd-backup
    subjects:
      - kind: ServiceAccount
        name: cronjob-etcd-bkp-sa
        namespace: etcd-bkp
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: cronjob-etcd-bkp-cr
    oc apply -f cluster_role_binding.yaml

    AWS S3 secret 

    We need to store the AWS access key ID and AWS access key in a secret for the S3 bucket. This secret is referred to in the CronJob to access the S3 bucket. Below is the sample to create the key:

    apiVersion: v1
    kind: Secret
    metadata:
      name: aws-s3-etcd-key
      namespace: etcd-bkp
    type: Opaque
    data: 
      aws_access_key_id: <key_id | base 64>
      aws_secret_access_key: <access_key | base 64>
      region: <bucket_region | base 64>
    oc apply -f s3_secret.yaml

    CronJob

    We can run the CronJob in two ways. 

    We can schedule our CronJob in such a way that the job runs in master node and takes the backup in the node itself and pushes it to S3 bucket. Let’s explore the CronJob provided below.

    In this CronJob, the task runs on master node because of node selector:

    spec:
    	nodeSelector:
    		node-role.kubernetes.io/master: '' 

    It uses the CLI image to initiate the backup:

    image: registry.redhat.io/openshift4/ose-cli

    It then invokes the backup script:

    chroot /host /usr/local/bin/cluster-backup.sh

    It creates the backup at /home/core/backup with date appended to the name:

    chroot /host /usr/local/bin/cluster-backup.sh /home/core/backup/$(date "+%F_%H%M%S")

    It cleans the older backups:

    chroot /host find /home/core/backup -minidepth 1 -type d -mmin +2 -exec rm -rf {} \;

    Finally, it pushes the backup to AWS S3 bucket with AWS CLI image:

    then aws s3 cp /host/home/core/backup/ s3://ocp-etcd-sync --recursive;
    kind: CronJob
    apiVersion: batch/v1
    metadata:
      name: cronjob-etcd-backup
      namespace: etcd-bkp
      labels:
        app.kubernetes.io/name: cronjob-etcd-backup
    spec:
      schedule: "* * * * *"
      concurrencyPolicy: Forbid
      suspend: false
      jobTemplate:
        metadata:
          labels:
            app.kubernetes.io/name: cronjob-etcd-backup
        spec:
          backoffLimit: 0
          template:
            metadata:
              labels:
                app.kubernetes.io/name: cronjob-etcd-backup
            spec:
              nodeSelector:
                node-role.kubernetes.io/master: ''
              restartPolicy: Never
              activeDeadlineSeconds: 500
              serviceAccountName: cronjob-etcd-bkp-sa
              hostPID: true
              hostNetwork: true
              enableServiceLinks: true
              schedulerName: default-scheduler
              terminationGracePeriodSeconds: 30
              securityContext: {}
              containers:
                - name: cronjob-etcd-backup
                  image: registry.redhat.io/openshift4/ose-cli
                  terminationMessagePath: /dev/termination-log
                  command:
                  - /bin/bash
                  - '-c'
                  - >-
                    echo -e '\n\n---\nCreate etcd backup local to master\n' &&
                    chroot /host /usr/local/bin/cluster-backup.sh /home/core/backup/$(date "+%F_%H%M%S") &&
                    echo -e '\n\n---\nCleanup old local etcd backups\n' &&
                    chroot /host find /home/core/backup/ -mindepth 1 -type d -mmin +2 -exec rm -rf {} \;
                  securityContext:
                    privileged: true
                    runAsUser: 0
                    capabilities:
                      add:
                        - SYS_CHROOT
                  imagePullPolicy: Always
                  volumeMounts:
                    - name: host
                      mountPath: /host
                  terminationMessagePolicy: File
                - name: aws-cli
                  image: amazon/aws-cli:latest
                  command:
                  - /bin/bash
                  - '-c'
                  - >-
                    while true; do if [[  $(find /host/home/core/backup/ -type d -cmin -1 ]]; then aws s3 cp /host/home/core/backup/ s3://ocp-etcd-sync --recursive; break; fi; done
                  env:
                  - name: AWS_ACCESS_KEY_ID
                    valueFrom:
                      secretKeyRef:
                        name: aws-s3-etcd-key
                        key: aws_access_key_id
                  - name: AWS_SECRET_ACCESS_KEY
                    valueFrom:
                      secretKeyRef:
                        name: aws-s3-etcd-key
                        key: aws_secret_access_key
                  - name: AWS_DEFAULT_REGION
                    valueFrom:
                      secretKeyRef:
                        name: aws-s3-etcd-key
                        key: region
                  volumeMounts:
                    - name: host
                      mountPath: /host
              volumes:
              - name: host
                hostPath:
                  path: /
                  type: Directory
              dnsPolicy: ClusterFirst
              tolerations:
              - key: node-role.kubernetes.io/master
      successfulJobsHistoryLimit: 5
      failedJobsHistoryLimit: 5

    Another way to initiate the backup is to schedule the CronJob in worker node, but it needs to access the master node to do the backup and push it to S3. Below is an example that goes to every master node and takes backup to master node. This can be moved to S3 or any other file storages like NFS or volumes that are backed up. 

    This backup is scheduled to run every 12 hours. During the backup process, the CronJob will also try to delete the older backups that might not be required any longer to avoid filling up storage. It uses the image registry.redhat.io/openshift4/ose-cli. Instead, you can create your own image using Red Hat Universal Base Image (UBI) as base image and install the oc CLI in the base image. 

    This below job will begin the backup at /home/core/backup/ on master nodes:

    ---
    kind: CronJob
    apiVersion: batch/v1
    metadata:
      name: cronjob-etcd-backup
      namespace: etcd-bkp
      labels:
        app: ocp-etcd-bkp
    spec:
      concurrencyPolicy: Forbid
      schedule: "0 */12 * * *"
      failedJobsHistoryLimit: 5
      successfulJobsHistoryLimit: 5
      jobTemplate:
        metadata:
          labels:
            app: ocp-etcd-bkp
        spec:
          backoffLimit: 0
          template:
            metadata:
              labels:
                app: ocp-etcd-bkp
            spec:
              containers:
                - name: etcd-backup
                  image: "registry.redhat.io/openshift4/ose-cli"
                  command:
                    - "/bin/bash"
                    - "-c"
                    - oc get no -l node-role.kubernetes.io/master --no-headers -o name | xargs -I {} --  oc debug {}  --to-namespace=etcd-bkp -- bash -c 'chroot /host sudo -E /usr/local/bin/cluster-backup.sh /home/core/backup/ && chroot /host sudo -E find /home/core/backup/ -type f -mmin +"1" -delete'
              serviceAccountName: "cronjob-etcd-bkp-sa"
              serviceAccount: "cronjob-etcd-bkp-sa"

    This job differs from the previous one; it runs on worker nodes and uses oc debug to log in to the master node, list all master nodes, and begin backup one by one using the following command:

    - oc get no -1 node-role.kubernetes.io/master --no-headers -o name | xargs -I {} -- oc debug {}

    References

    • OCP Disaster Recovery Part 1 - How to Create Automated ETCD Backup in Openshift 4.x
    • Documentation: Backing up etcd
    • Solution: Automate syncing RHOCP's etcd-db backups to an AWS S3 bucket

    Related Posts

    • Red Hat technologies make open hybrid cloud a reality

    • Using containerization for modern hybrid cloud application development

    • My advice for designing features for the hybrid cloud

    • CloudForms: Manage your IT and Hybrid Cloud through a single platform

    Recent Posts

    • Meet the Red Hat Node.js team at PowerUP 2025

    • How to use pipelines for AI/ML automation at the edge

    • What's new in network observability 1.8

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    What’s up next?

    Address cross-cloud identity challenges with SPIFFE/SPIRE on OpenShift by deploying and working with applications in a no-cost OpenShift cluster.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue