Breadcrumb

  1. Red Hat Interactive Learning Portal
  2. OpenShift learning
  3. Red Hat OpenShift Virtualization disaster recovery
  4. Automate disaster recovery

Red Hat OpenShift Virtualization disaster recovery

Implement disaster recovery with storage replication and OpenShift APIs for Data Protection (OADP) for Red Hat OpenShift Virtualization environments that use third-party storage. Efficiently and consistently recover your data across disparate clusters. 

This learning path covered a disaster recovery strategy for virtual machines (VMs) that separates Kubernetes metadata from their persistent volume claims (PVC) and PersistentVolumes (PV). This strategy uses OpenShift APIs for Data Protection (OADP) for metadata operations, while using the storage system's original snapshot and replication capabilities for data volumes. A key component is the OADP plugin, which modifies PV metadata during the restore process to ensure that the data volumes point to the correct storage location at the disaster recovery site.

Prerequisites: 

  • N/A 

In this lesson, you will:

  • Review the disaster recovery strategy
  • Outline the high-level logic for an automation playbook that could automate the strategy

Review

This learning path covered a disaster recovery strategy for virtual machines (VMs) that separates Kubernetes metadata from their persistent volume claims (PVC) and PersistentVolumes (PV). This strategy uses OADP for metadata operations, while using the storage system's original snapshot and replication capabilities for data volumes. A key component is the OADP plugin, which modifies PV metadata during the restore process to ensure that the data volumes point to the correct storage location at the disaster recovery site.

Fully automated disaster recovery 

The manual steps outlined in this learning path form the basis for a fully automated disaster recovery workflow. An automation tool like Red Hat Ansible Automation Platform can orchestrate the entire process. The high-level logic for an automation playbook would include:

  1. Primary site cleanup: On the primary site, a periodic task should run to clean up stale resources. If a snapshot is deleted and the PVC created from it is no longer referenced by any VM, the PVC should be automatically deleted to reclaim storage.
  2. Backup validation: The automation should check the status of each OADP backup. Due to the plugin's logic, a backup might be marked as PartiallyFailed if it finds that a PVC from a snapshot has already been created by a previous run. The script should verify that this is the only cause of failure and that no other critical resources failed to back up.
  3. Failover execution: In a disaster scenario, the playbook would trigger the final OADP Restore on the disaster recovery cluster. This restore brings the VMs online, connecting them to the already replicated and pre-staged persistent volumes.
  4. Post-failover cleanup: After a successful restore, the automation should perform a reconciliation. If a PVC was deleted on the primary site before the disaster, it should also be removed from the disaster recovery site to maintain consistency.
  5. PV garbage collection: Since the StorageClass uses a Retain policy, PVs are not automatically deleted. The automation should include steps to safely identify and remove orphaned PVs on both the primary and disaster recovery sites after failback and cleanup operations are complete.

Ready to learn more about Virtualization?

Previous resource
Automate backups