Choosing the right persistent storage for a cloud development environment (CDE)—such as Eclipse Che or Red Hat OpenShift Dev Spaces—is an architectural decision that directly affects developer productivity, cost, and application resilience. The correct solution depends on matching the application's needs with the storage's capabilities.
This guide covers the key factors that influence performance and resiliency in CDE storage solutions and explains the trade-offs between different options. We will explore the architectural decisions involved in configuring a reliable, dual-zoned storage backend for CDE on a Red Hat OpenShift Service on AWS (ROSA) cluster using FSx for NetApp ONTAP.
We analyze architectural trade-offs and establish recommended storage configurations by examining the relationship between different storage concepts. We will discuss:
volumeMode(FileSystem compared with Block)- Persistent Volume Claim (PVC) strategies (per-workspace compared with per-user)
- Storage protocols (iSCSI compared with NFS)
- Access modes (RWO compared with RWX)
- Container Storage Interface (CSI) drivers and the
volumeModelifecycle - Recommended configurations based on PVC strategy
After reading this guide, you can select the FSx configuration that best meets your team's requirements for performance, cost, and resilience.
volumeMode (FileSystem vs. Block)
Developers using CDEs need a standardized storage solution for their project's source code, configuration files, and build artifacts. This storage must act as a normal directory structure (a hierarchical file system).
Red Hat OpenShift Dev Spaces relies on a basic Kubernetes storage concept: volumeMode. The application requires volumeMode: FileSystem because it expects the PVC to be mounted as a directory (such as /projects). This allows for standard file operations—like create, read, write, and delete—and directory navigation. The alternative, volumeMode: Block, exposes the storage as a raw device. This is incompatible with general-purpose source code storage and is used only for specialized applications, such as databases.
How volumeMode relates to the PVC strategy
The volumeMode: FileSystem parameter is necessary for both storage strategies used in CDEs. When you use the per-workspace PVC strategy, each workspace receives a dedicated PVC mounted as a standard directory (/projects) inside the container. This directory structure is a basic requirement for source code storage, so FileSystem is the appropriate choice.
In the per-user PVC strategy, a single volume is shared among a user's multiple workspaces. The DevWorkspace Operator manages separate subdirectories, or subPaths, on that single volume for each workspace. Because managing subdirectories is a file system operation, this strategy depends on the underlying storage using volumeMode: FileSystem.
How it relates to protocol and access mode
Using volumeMode: FileSystem limits your storage options by excluding solutions designed exclusively for raw block devices. This requirement helps determine the necessary access modes in Red Hat OpenShift Dev Spaces strategies. Storage protocols fall into two main categories: file storage (such as NFS or CephFS) and block storage (such as iSCSI or Amazon EBS). File storage is suited to provide the ReadWriteMany (RWX) access mode because it is designed for concurrent, shared directory access by multiple nodes. This RWX capability is often required for the per-user PVC strategy, which mounts a single volume to multiple workspace pods across different cluster nodes.
Block storage providers typically support only ReadWriteOnce (RWO), which limits the volume to a single node. RWO is often sufficient for the per-workspace strategy because each PVC is dedicated to a single pod. However, using RWO block storage for the per-user strategy can lead to pod scheduling failures because the Kubernetes scheduler cannot mount the volume to a second node. This limitation exists because protocols like iSCSI are block-level. They do not support concurrent access from multiple nodes without a specialized cluster-aware file system. Therefore, the underlying protocol and its support for RWX are critical to the scalability of the multi-workspace, per-user model.
AccessMode relevance (RWO vs. RWX)
The pvcStrategy determines the AccessMode:
per-workspacestrategy (RWO):- Because each workspace receives a dedicated PVC, the volume mounts only to the single node running the workspace pod.
- ReadWriteOnce (RWO) is the recommended choice and is easier to provision with cloud block storage.
per-userstrategy (RWX):- This strategy shares a single PVC across a user's multiple workspaces.
- If a user runs two workspaces concurrently, or if a single workspace includes multiple pods that share the volume, the volume must be mounted to multiple nodes.
- This concurrency requires ReadWriteMany (RWX).
CSI drivers and the volumeMode lifecycle
CSI drivers, such as Trident for NetApp, hide storage complexity and bridge the gap between the Kubernetes request and the physical hardware.
Fulfilling a volumeMode: FileSystem request involves three stages: request, provisioning, and preparation.
- First, the application environment initiates the process by requesting storage with the
FileSystemmode via the Kubernetes PVC object. - Second, the CSI driver acts as the provisioner. It communicates with the external storage array, such as NetApp, to create the underlying physical volume. This volume can be a raw block volume or a preconfigured file share.
- Third, in the preparation stage, the CSI driver works with the kubelet on the worker node. If a raw block volume was provisioned, the driver ensures the raw device is attached via iSCSI, formatted with a usable file system (such as ext4), and mounted as a directory at the container's mount point (such as
/projects). The CSI driver responds to thevolumeModerequest and ensures that a file system is delivered to the application.
The CSI driver handles the provisioning and formatting details. It ensures that even a block-backed volume is converted into the required FileSystem format before it reaches the workspace pod.
Recommended FSx for ONTAP configurations by CDE PVC strategy
The right storage solution depends on matching the CDE PVC strategy with the storage protocol and access mode. This section analyzes the architectural needs and trade-offs of combining the per-workspace and per-user strategies with iSCSI or NFS protocols. This combination achieves the required balance of isolation, performance, and sharing capability.
Per-workspace PVC strategy: High-performance isolation
This strategy creates one dedicated volume per workspace, prioritizing isolation, security, and performance. It is suitable for complex, I/O-heavy individual tasks.
| Recommended FSx setup | Protocol: iSCSI (ONTAP-SAN) | Access mode: RWO (ReadWriteOnce) |
|---|---|---|
| Why this combination? | This is the recommended technical match. The iSCSI block protocol delivers low latency for dedicated I/O, which is what an isolated RWO volume requires. | |
| Technical breakdown | Every workspace receives a unique iSCSI logical unit number (LUN). The RWO mode enforces single-node attachment, satisfying the isolation requirement. | |
| Resiliency detail | Multi-AZ data is synchronously replicated. Node high availability (HA) is provided by iSCSI multipathing on the worker node. The kernel's DM-Multipath daemon detects and consolidates the multiple network paths to the single LUN. This ensures a continuous connection and transparent failover to the standby FSx server without requiring volume remounts. | |
| Trade-offs | High performance; low administrative complexity (after initial setup). |
Use cases:
- High-I/O workloads: Compiling large projects or running local databases (such as PostgreSQL or MongoDB) inside the CDE.
- Strict data segregation: Environments requiring optimized performance and isolation between developer environments.
Per-user PVC strategy: Shared persistence and efficiency
This strategy creates one volume per user, consolidating all of that user's workspaces onto a single shared storage resource. This is required for consolidation and sharing of common assets.
| Recommended FSx setup | Protocol: NFS (ONTAP-NAS) | Access mode: RWX (ReadWriteMany) |
|---|---|---|
| Why this combination? | This is a viable technical configuration for consolidation. If a user runs two workspaces on two different nodes, both nodes must mount the same volume simultaneously. This functionality is provided by NFS and RWX. | |
| Technical breakdown | The operator uses subPath mounting to create separate directories for each workspace on the single shared NFS volume. The RWX mode allows multiple pods on different nodes to safely access this volume. | |
| Resiliency detail | Multi-AZ data is synchronously replicated. The NFS service is exposed via multiple data Logical Network Interfaces (LIFs) across AZs. | |
| Trade-offs | Higher latency than iSCSI due to protocol overhead. Best for storage efficiency and lower cost of operation. |
Use cases:
- Shared user assets: Accessing common configuration files across all of the user's workspaces.
- Cost efficiency: Consolidating storage to reduce the total volume count and improve storage use across large user bases with low-I/O requirements.
Storage resiliency
In cloud storage, resiliency is the system's ability to maintain data integrity and accessibility despite component failures. Resiliency is characterized by two distinct components:
- High availability (HA) focuses on access and minimizing downtime. It measures how quickly a workload can resume operation after a failure, such as node loss. For OpenShift Dev Spaces, multi-node HA requires ReadWriteMany (RWX) capability to allow a user's PVC to be remounted on a different worker node instantly.
- Data durability focuses on data preservation and integrity. It ensures that data is safely maintained and recoverable in the event of hardware failure. This applies primarily to large-scale catastrophic events such as zonal outages (the complete loss of an Availability Zone).
This section focuses on data durability, which is handled by the storage infrastructure itself.
FSx ONTAP implementation of resiliency
NetApp FSx for ONTAP ensures resiliency, particularly when deployed in a Multi-Availability Zone (Multi-AZ) configuration. This configuration is standard for production durability.
Data durability
FSx for ONTAP ensures data durability through synchronous replication across two Availability Zones (AZs):
- Dual AZ replication: All data written to the active storage environment is synchronously replicated to a passive failover environment in a separate AZ. This ensures that no data is lost during a total infrastructure failure in the primary AZ.
- Snapshots: The platform supports efficient, point-in-time snapshots of the file system, enabling rapid recovery from accidental data deletion or logical corruption.
High availability (HA)
FSx uses protocol-specific features to maintain volume access during localized failures:
- iSCSI (block) HA: Achieved through multipathing. The single Kubernetes worker node connects to the iSCSI target using multiple redundant network paths. If one path fails, such as a network interface error, I/O continues over the remaining paths. This maintains the connection without violating the ReadWriteOnce (RWO) constraint.
- NFS (file) HA: Achieved through the native ReadWriteMany (RWX) protocol. Because the volume is shareable across multiple worker nodes, the Kubernetes scheduler can immediately bring up a new workspace pod on any other healthy node to re-establish access if the primary node or network path accessing the volume fails.
Performance and resiliency trade-offs
Choosing between block and file protocols involves balancing I/O performance with the ease of automated recovery after a failure. While both protocols offer high data durability through Multi-AZ replication, their differing access modes create a trade-off in administrative overhead during an outage.
| Protocol | Primary strength | Trade-off (Administrative requirements) |
|---|---|---|
| iSCSI (block) | Performance. Offers higher I/O per second (IOPS) and lower latency because its block-level access is efficient for data transfer. | Manual recovery overhead. Supports only RWO. If the host node fails (for example, due to a zonal outage), the volume remains logically "attached" to the failed node. This requires manual intervention or significant timeouts before the new workspace pod can mount the volume on a healthy node. |
| NFS (file) | Resilience in recovery. Supports RWX access, allowing the volume to be mounted by multiple nodes. | Performance overhead. Incurs higher network latency and file locking overhead necessary for shared access. This overhead is accepted to gain automated recovery. |
iSCSI offers the fastest runtime performance but a slower, more complex failure recovery. NFS offers a faster failure recovery at the expense of a small performance decrease during normal operation.
Making the right choice
Your choice of persistent storage for Red Hat OpenShift Dev Spaces on ROSA shapes the reliability and efficiency of your development environment. Success comes from aligning your protocol choice with your workspace strategy:
- Choose iSCSI (RWO) with the per-workspace strategy to maximize performance and isolation for high-intensity workloads.
- Choose NFS (RWX) with the per-user strategy to enable resource sharing, cost efficiency, and automated failure recovery.
FSx for NetApp ONTAP provides a flexible foundation for either approach, allowing you to build a storage backend that meets your specific operational goals.
Read part 2: Set up FSx for NetApp ONTAP on Red Hat OpenShift Service on AWS