Page

Automate disaster recovery

March 24, 2026

Zheng Wang (George)

This learning path covered a disaster recovery strategy for virtual machines (VMs) that separates Kubernetes metadata from their persistent volume claims (PVC) and PersistentVolumes (PV). This strategy uses OpenShift APIs for Data Protection (OADP) for metadata operations, while using the storage system's original snapshot and replication capabilities for data volumes. A key component is the OADP plugin, which modifies PV metadata during the restore process to ensure that the data volumes point to the correct storage location at the disaster recovery site.

Prerequisites:

In this lesson, you will:

Review the disaster recovery strategy
Outline the high-level logic for an automation playbook that could automate the strategy

Review

This learning path covered a disaster recovery strategy for virtual machines (VMs) that separates Kubernetes metadata from their persistent volume claims (PVC) and PersistentVolumes (PV). This strategy uses OADP for metadata operations, while using the storage system's original snapshot and replication capabilities for data volumes. A key component is the OADP plugin, which modifies PV metadata during the restore process to ensure that the data volumes point to the correct storage location at the disaster recovery site.

Fully automated disaster recovery

The manual steps outlined in this learning path form the basis for a fully automated disaster recovery workflow. An automation tool like Red Hat Ansible Automation Platform can orchestrate the entire process. The high-level logic for an automation playbook would include:

Primary site cleanup: On the primary site, a periodic task should run to clean up stale resources. If a snapshot is deleted and the PVC created from it is no longer referenced by any VM, the PVC should be automatically deleted to reclaim storage.
Backup validation: The automation should check the status of each OADP backup. Due to the plugin's logic, a backup might be marked as PartiallyFailed if it finds that a PVC from a snapshot has already been created by a previous run. The script should verify that this is the only cause of failure and that no other critical resources failed to back up.
Failover execution: In a disaster scenario, the playbook would trigger the final OADP Restore on the disaster recovery cluster. This restore brings the VMs online, connecting them to the already replicated and pre-staged persistent volumes.
Post-failover cleanup: After a successful restore, the automation should perform a reconciliation. If a PVC was deleted on the primary site before the disaster, it should also be removed from the disaster recovery site to maintain consistency.
PV garbage collection: Since the StorageClass uses a Retain policy, PVs are not automatically deleted. The automation should include steps to safely identify and remove orphaned PVs on both the primary and disaster recovery sites after failback and cleanup operations are complete.

Ready to learn more about Virtualization?

Windows failover clustering in Red Hat OpenShift Virtualization using SCSI-3 persistent reservation

OpenShift Virtualization for VMware administrators cheat sheet

Red Hat OpenShift Virtualization disaster recovery

Automate disaster recovery

Prerequisites:

In this lesson, you will:

Review

Fully automated disaster recovery

Ready to learn more about Virtualization?

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Breadcrumb

Red Hat OpenShift Virtualization disaster recovery

Path resource: Automate disaster recovery

Prerequisites:

In this lesson, you will:

Review

Fully automated disaster recovery

Ready to learn more about Virtualization?

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links