As customer demand for Red Hat OpenShift Virtualization increases, Red Hat is getting a lot of questions about the integration of customers’ new and existing storage solutions to support backing the storage for virtual machines. The goal of this article is to help the Red Hat partner ecosystem understand what it means for a storage solution to support OpenShift Virtualization and the virtual machines that it enables.
While Red Hat has many storage partners in our partner ecosystem that have certified their Container Storage Interface (CSI) plug-in(s) for use with Red Hat OpenShift, support for OpenShift Virtualization has some unique requirements. The backend storage must support a standard set of features to support backing a virtual machine (VM), live migration, and various lifecycle management operations. When applicable, it should support backing up and restoring virtual machines. Finally, it may also offer a disaster recovery solution, with an automated or semi-automated means of failover. This article will help storage vendors understand what it means for a storage solution to support OpenShift Virtualization and what testing should be done to validate that support.
What is Red Hat OpenShift Virtualization?
Red Hat OpenShift Virtualization, based on the upstream KubeVirt project, is included with all three offerings of Red Hat OpenShift, including Red Hat OpenShift Kubernetes Engine, Red Hat OpenShift Container Platform (RHOCP) and Red Hat OpenShift Platform Plus, and does not require any additional subscriptions. It allows OpenShift cluster users to run virtual machines in the same environment in which containerized applications are run, even on the same nodes. This allows administrators and users to use a common platform, user interface, and development processes to manage containers and VMs side-by-side.
Like the containers they may coexist with, virtualized applications can take advantage of the many features of Kubernetes, including declarative configurations, cluster networking, storage constructs, health probes, high availability, and much more. While some applications may live on indefinitely in virtual machines, others may eventually be refactored into container-native applications on the application owner’s timeline. OpenShift Virtualization also includes entitlements for Red Hat Enterprise Linux (RHEL)-based virtual machines.
Storage backing for virtual machines
The first step is to onboard the storage provider with OpenShift Virtualization. This will allow it to be specified as the default storage class. The storage integration guide explains how to properly onboard the storage provider.
There are several fundamental features that a storage solution, through its CSI plug-in, should provide in order to properly support OpenShift Virtualization. The following is a list of features that are considered fundamental, though not exhaustive, to ensure a positive experience:
- Standard support for dynamically provisioned persistent volumes (PVs) and persistent volume claims (PVCs).
- A virtual machine can be created and started from a golden image.
- A virtual machine can be cloned.
- Virtual machine snapshots can be created via the CSI plug-in.
- Virtual machine live migration works, requiring that the backend storage supports ReadWriteMany (RWX).
- Virtual machine volumes can be hot-plugged and unplugged.
- Concurrent virtual machine boot-up to validate scale.
The OpenShift Virtualization team has created an easy to run test suite known as the kubevirt-storage-checkup that storage vendors can use to validate their solution’s support for OpenShift Virtualization. The test suite will run through the above tests, as well as several others, and report on the results. See the kubevirt-storage-checkup
readme file for a full list of tests. This test suite is now included as part of the mandatory tests to be run as part of the CSI plug-in certification for Red Hat OpenShift starting with the OpenShift 4.17 release.
It is also recommended that the CSI plug-in offers support for volume groups and snapshot changed blocks as they continue to mature and gain adoption in the Kubernetes community.
Performance and scale
While support for the features listed in the previous section are essential, the storage solution must also be able to provide storage for virtual machines that is performant at scale. There are many scenarios in which a large number of virtual machines may be brought up in parallel, such as in the cases of site recovery or during a large number of VM migrations into OpenShift Virtualization.
The kubevirt-storage-checkup
will perform a basic test of scale by booting a small number of virtual machines concurrently (10 by default). It is highly recommended that our storage partners validate a higher number of concurrent VM boot-up scenarios.
Starting with OpenShift 4.17, the CSI plug-in certification will also perform a scale test of pods and PVCs. While the default maximum number of LUN IDs in Red Hat Enterprise Linux CoreOS is 255 (it is configurable), we need to make sure that the storage provider will be tested beyond that number. A test has been added to test 260 pods with PVCs on a single node, by default. This number is configurable, as is the timeout for the test, which could take a while depending on the number of pods and PVCs created.
Another test that partners should consider is the use of a high number of paths in their storage multi-path configurations. While this is not specific to OpenShift Virtualization and VMs, as it will also apply in container-only environments, it must also be accounted for. For example, if a host has 8 paths to a storage device hosting 200 persistent volumes, that would mean up to 1,600 paths would need to be supported by the environment.
Storage partners are encouraged to evaluate their storage providers performance and scale by implementing any additional tests needed to ensure the storage can meet the varying needs of our joint customers. For comprehensive guidance on scale testing, see the article on how to use kube-burner to measure Red Hat OpenShift VM and storage deployment at scale.
Virtual machine backup and restore
Any comprehensive data protection solution will offer a backup and restore process. In OpenShift Virtualization, snapshots are used to backup and restore virtual machines. Virtual machine snapshots are supported by any CSI plug-in that supports the Kubernetes Volume Snapshot API.
A snapshot is a saved representation of the state and data of a virtual machine at the time the snapshot was created. OpenShift Virtualization supports the creation of a snapshot of a virtual machine that is running or stopped. If the VM is running and the QEMU agent is installed for the VM, then the filesystem of the VM will be frozen until the snapshot process is complete. The snapshot will include each of the CSI volumes attached to the VM as well as a copy of the virtual machine metadata and specification.
Storage vendors can follow the backup and restore integration guide to include this functionality. The OpenShift API for Data Protection (OADP) may also be used for backup and restore operations, though the use of OADP is not a requirement. OADP is an optional component, included with all OpenShift product offerings, that provides a set of Kubernetes-native APIs for backup and restore operations inclusive of virtual machine snapshots. OpenShift Virtualization ships with a Velero plug-in that eases the integration with OADP for backup and restore.
It is also recommended that backup and restore storage solutions include support for off-site storage and retrieval of the virtual machine snapshots. This is commonly done by storing the snapshots in an off-site S3 storage bucket. Support for volume group backup and restore as well as incremental backups are also commonly requested by customers.
Disaster recovery
A disaster recovery solution is essential to maintaining operations during a site outage and for site recovery. Red Hat currently offers two methods of disaster recovery: Metro-DR uses synchronous replication to ensure both sites are always synchronized, while Regional-DR uses asynchronous replication whereby data is periodically synchronized between sites. The solutions consist of OpenShift clusters, Red Hat OpenShift Data Foundation, and Red Hat Advanced Cluster Management for orchestration.
Storage vendors are encouraged to implement their own disaster recovery solution for OpenShift clusters, and in the context of this article, for VMs running in OpenShift Virtualization. The OpenShift Virtualization team has created a comprehensive Red Hat OpenShift Virtualization disaster recovery guide for storage vendors to use for guidance during the creation of their own solutions.
Summary
As a storage vendor providing backing storage for Kubernetes and Red Hat OpenShift, there are some additional considerations when providing the backing storage for KubeVirt and OpenShift Virtualization. Features such as VM cloning, snapshots, live migration, volume attachment and detachment, backup and restore, and disaster recovery will all be features often requested, and sometimes required, by customers deploying or migrating to OpenShift Virtualization. The following is a summarized list of features to support and/or tests to validate in support of OpenShift Virtualization:
- Execute the
kubevirt-storage-checkup
suite of tests (mandatory for RHOCP 4.17 CSI certification). - Perform additional scale and performance testing, including multi-paths.
- Backup and restore of virtual machine snapshots.
- Off-site storage and retrieval of virtual machine snapshots.
- Incremental backups.
- Synchronous and asynchronous replication for multi-site disaster recovery scenarios.
- Automated and/or semi-automated orchestration of multi-site disaster recovery.
The following two features are still in early stages of Kubernetes adoption, but should be considered as they mature:
- Support for volume group snapshots.
- Support for snapshot changed blocks.
Reach out to your Red Hat contacts if you have any questions.
Thanks to Peter Lauterbach, Adam Litke, Andrew Sullivan, and Gregory Charot for reviewing this article.