The first time I watched a node run out of disk space while pulling a 6 GB GPU PyTorch image, I knew we needed a better way to handle container image storage. In my work with AI/ML teams running workloads on Red Hat OpenShift Container Platform, disk space management has become one of the most common pain points, especially as model sizes continue to grow.
Split disk configuration solves this problem by directing newly pulled container images to a dedicated filesystem while keeping container runtime data on the boot disk. This approach gives you better control over disk space allocation and separates the concerns of image storage from container operations. In this article, I'll walk you through setting up a split disk on Red Hat OpenShift 4.22 and later for AWS and Google Cloud (GCP) platforms. Split disk is currently a developer preview feature.
How container storage works
Before we dive into the configuration, it's worth understanding how CRI-O, the container runtime that powers OpenShift, manages storage. This context will help you appreciate why split disk works the way it does.
CRI-O handles two fundamentally different types of data. First, there are container images (the read-only layers that form the base of your containers). Think of these as the immutable foundation: your application code, dependencies, and runtime environment all packaged together. Second, there's container runtime data (the read-write state that includes active container processes, writable layers, and runtime metadata). By default, CRI-O stores everything under /var/lib/containers/storage on the node's boot disk.
This works fine for typical workloads, but it breaks down when you start pulling large images. I've seen production clusters where data scientists were deploying TensorFlow containers that consumed 5 GB each, and suddenly the boot disk was full. The traditional solution has been to mount an entirely separate filesystem at /var/lib/containers, but that moves everything (images and runtime data) to the secondary disk, and OpenShift remains unaware of the underlying storage.
How split disk works
Split disk takes a more surgical approach. Instead of moving everything, we configure CRI-O to store only newly pulled images on a separate disk. The key word here is "newly pulled," the pre-baked images that ship with the Red Hat Enterprise Linux CoreOS Amazon Machine Image (AMI) stay exactly where they are on the boot disk. This distinction is important because it means you're not disrupting the foundational system images; you're simply redirecting where workload images land going forward.
Here's what remains on the boot disk: all the pre-baked images included in the Red Hat Enterprise Linux CoreOS AMI, system and base images that OpenShift depends on, container runtime state, and the writable layers where your containers make changes. Meanwhile, the split disk (which you might mount at /var/lib/images or another location) receives newly pulled images and their associated overlay directories.
CRI-O achieves this through its imagestore configuration option, which tells the runtime to use an alternate location for image storage. It's an elegant solution because it doesn't require migrating existing data—you simply point future image pulls to the new location, and everything continues to work.
The following remain in /var/lib/containers/storage/ on the boot disk:
- Pre-baked images included in the Red Hat Enterprise Linux CoreOS AMI
- System and base images
- Container runtime state
- Container writable layers
The following are stored (e.g., /var/lib/images/) on the secondary disk:
- New images pulled for workloads running as pods
- Images pulled after the node boots
- Image overlay directories:
overlay/,overlay-images/,overlay-layers/
Split disk does not migrate or move existing images. It only directs where CRI-O stores newly pulled images going forward. Pre-baked AMI images remain on the original disk and are still accessible.
Use cases
Let me give you a concrete example from my own experience with large AI/ML container images. A team running PyTorch model training on OpenShift was using the official pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime image, which weighs in at around 5.2 GB. They had a cluster with 10 worker nodes, and each node might run multiple training jobs throughout the day. Without split disk, every image pull consumed precious boot disk space. With split disk configured, they provisioned a 100 GB secondary disk for images, and the boot disk remained stable even under heavy workload churn.
The benefits extend beyond just avoiding disk pressure. By dedicating storage to images, you can size that disk independently based on your workload patterns. If you're running image-intensive workloads, you can provision larger secondary disks without over-provisioning the boot disk. If you need higher IOPS for image pulls, you can configure that specifically for the image disk. The separation gives you flexibility.
Split disk provides:
- Dedicated storage for large images: Image files are stored on a separate, larger disk.
- Optimized boot disk usage: Container runtime data stays on the fast boot disk.
- Better resource allocation: Manage disk space effectively for image-heavy workloads.
This configuration is particularly beneficial for:
- AI/ML workloads with large base images
- Environments pulling multiple large container images
- Clusters with limited boot disk space
Note: The split disk approach described in this article is a dev preview.
Prerequisites
Before you begin, make sure you have the OpenShift installer binary (openshift-install) for version 4.22 or later, your pull secret from the Red Hat Customer Portal, and cloud provider credentials configured. For AWS, that means your access key and secret are set up via aws configure or environment variables. For GCP, you'll need a service account key configured. You'll also need the MachineConfig file provided later in this article.
Creating a cluster with split disk
Split disk configuration needs to happen at cluster installation time. I'm going to walk you through the full process, explaining not just what commands to run, but why each step matters.
Step 1: Create the install configuration
The installer will prompt you for your platform (choose AWS or GCP), cluster name, pull secret, region, base domain, and other parameters. This interactive process creates install-config.yaml in your working directory. This file defines the high-level characteristics of your cluster, but we need to go deeper to inject our split disk configuration.
./openshift-install create install-config --dir ./ocp-split-disk-clusterThis creates an install-config.yaml in ./ocp-split-disk-cluster/.
Step 2: Generate manifests
This is where things get interesting. The installer takes your install-config.yaml and generates a full set of manifests, not just generic Kubernetes resources, but also OpenShift-specific configurations. You'll find manifest files in ./ocp-split-disk-cluster/manifests/ and OpenShift resources in ./ocp-split-disk-cluster/openshift/. Most importantly for our purposes, this process creates worker machineset files with names like 99_openshift-cluster-api_worker-machineset-0.yaml. These machinesets define how worker nodes are provisioned, and we need to modify them to attach secondary volumes.
./openshift-install create manifests --dir ./ocp-split-disk-clusterStep 3: Add split disk MachineConfig
Navigate to the OpenShift directory and copy the MachineConfig. The complete MachineConfig YAML is provided in the following MachineConfig details section. By placing it in the openshift directory, we ensure the installer reads it and applies it to worker nodes during cluster creation.
cd ocp-split-disk-cluster/openshift
cp ~/path/to/98-config-split-disk.yaml .Step 4: Edit worker machinesets
The installer creates multiple worker machineset files, typically one per availability zone.
ls 99_openshift-cluster-api_worker-machineset-*
Expected output:
99_openshift-cluster-api_worker-machineset-0.yaml
99_openshift-cluster-api_worker-machineset-1.yaml
99_openshift-cluster-api_worker-machineset-2.yaml
99_openshift-cluster-api_worker-machineset-3.yaml
99_openshift-cluster-api_worker-machineset-4.yamlYou must edit each file individually to ensure consistent configuration across all zones. Let me show you what to add for each platform.
For AWS, add the following under spec.template.spec.providerSpec.value.blockDevices:
- ebs:
encrypted: true
volumeSize: 100
volumeType: gp3
deviceName: /dev/xvdbThis is the complete example:
spec:
template:
spec:
providerSpec:
value:
blockDevices:
- ebs:
encrypted: true
iops: 0
volumeSize: 120
volumeType: gp3
deviceName: /dev/sda1 # Boot disk (already present)
- ebs:
encrypted: true
volumeSize: 100
volumeType: gp3
deviceName: /dev/xvdb # Secondary disk for split diskFor GCP, add the following under spec.template.spec.providerSpec.value.disks:
- autoDelete: true
boot: false
sizeGb: 100
type: pd-balancedThe following is the complete example:
spec:
template:
spec:
providerSpec:
value:
disks:
- autoDelete: true
boot: true
sizeGb: 128
type: pd-standard # Boot disk (already present)
- autoDelete: true
boot: false
sizeGb: 100
type: pd-balanced # Secondary disk for split diskRemember to edit all machineset files. Missing even one means nodes in that availability zone won't have secondary disks, and the split disk configuration will fail on those nodes.
Step 5: Create the cluster
With manifests prepared and machinesets edited, you're ready to create the cluster.
cd ../..
./openshift-install create cluster --dir ./ocp-split-disk-clusterThe installer now orchestrates the entire deployment. It reads all your manifests, including the split disk MachineConfig, creates the cloud infrastructure (VPC, subnets, load balancers, and DNS), provisions bootstrap, control plane, and worker nodes, attaches the secondary volumes you defined, and applies the MachineConfig to all workers.
On each worker node, the split disk setup runs automatically through a carefully orchestrated sequence. The find-secondary-device.service systemd unit executes first, running a script that detects your cloud platform, locates the secondary device (NVMe on AWS, SCSI on GCP), and formats it with an XFS filesystem labeled "splitdisk." Next, the var-lib-images.mount unit mounts this device at /var/lib/images. SELinux contexts are then configured to treat this directory equivalently to /var/lib/containers, ensuring container operations work correctly. Finally, CRI-O starts with the imagestore parameter pointing to /var/lib/images.
This installation typically takes 30 to 45 minutes. When it completes, you'll have a fully functional cluster with split disk configured and ready to handle large image pulls.
MachineConfig details
Understanding what's happening under the hood helps you troubleshoot issues and adapt the configuration to your needs. Let me walk through each component of the MachineConfig.
Device detection script
The heart of the split disk setup is a bash script embedded in the MachineConfig. The find-secondary-device script:
- Detects the cloud platform (AWS or GCP)
- Searches for the secondary device:
- AWS: NVMe devices (
/dev/nvme*) - GCP: SCSI devices (
/dev/sd[b-z])
- AWS: NVMe devices (
- Creates an XFS filesystem labeled
splitdisk - Retries up to 30 times with 2-second delays (60 seconds total)
CRI-O configuration
The MachineConfig creates /etc/crio/crio.conf.d/99-imagestore.conf drop-in as follows:
[crio]
imagestore = "/var/lib/images"This tells CRI-O to store newly pulled images on the split disk instead of the default location.
Systemd units
The find-secondary-device.service executes the device detection script and has a condition that prevents it from running if the marker file /etc/var-lib-split-disk-mount already exists. This makes it a one-time initialization service. It's ordered before local-fs-pre.target, ensuring the device is ready before the system attempts to mount filesystems.
#!/bin/bash
# Detect cloud platform - check sys_vendor first (more reliable than product_name)
PLATFORM=""
if [ -f /sys/class/dmi/id/sys_vendor ]; then
VENDOR=$(cat /sys/class/dmi/id/sys_vendor 2>/dev/null || echo "")
if [[ "$VENDOR" == *"Google"* ]]; then
PLATFORM="gcp"
elif [[ "$VENDOR" == *"Amazon"* ]] || [[ "$VENDOR" == *"EC2"* ]]; then
PLATFORM="aws"
fi
fi
# Fallback to product_name if sys_vendor didn't work
if [ -z "$PLATFORM" ] && [ -f /sys/class/dmi/id/product_name ]; then
PRODUCT=$(cat /sys/class/dmi/id/product_name 2>/dev/null || echo "")
if [[ "$PRODUCT" == *"Google"* ]]; then
PLATFORM="gcp"
elif [[ "$PRODUCT" == *"Amazon"* ]]; then
PLATFORM="aws"
fi
fi
# If platform detection failed, try metadata services
if [ -z "$PLATFORM" ]; then
if curl -s -f -m 1 http://169.254.169.254/computeMetadata/v1/ -H "Metadata-Flavor: Google" &>/dev/null; then
PLATFORM="gcp"
elif curl -s -f -m 1 http://169.254.169.254/latest/meta-data/ &>/dev/null; then
PLATFORM="aws"
fi
fi
echo "Detected platform: ${PLATFORM}"
# Retry up to 30 times with 2 second delay (60 seconds total)
MAX_RETRIES=30
RETRY_DELAY=2
for attempt in $(seq 1 $MAX_RETRIES); do
echo "Attempt $attempt of $MAX_RETRIES to find secondary device"
# Search for secondary device based on platform
if [ "$PLATFORM" == "aws" ]; then
# AWS uses NVMe devices
for device in /dev/nvme*[0-9]*n*; do
/usr/sbin/blkid "${device}" &> /dev/null
if [ $? == 2 ]; then
echo "secondary device found ${device}"
echo "creating filesystem for containers mount"
if ! mkfs.xfs -L splitdisk -f "${device}" &> /dev/null; then
echo "Failed to create filesystem on ${device}" >&2
exit 1
fi
udevadm settle
touch /etc/var-lib-split-disk-mount
exit 0
fi
done
elif [ "$PLATFORM" == "gcp" ]; then
# GCP uses SCSI devices (skip sda which is boot disk)
for device in /dev/sd[b-z]; do
if [ -b "${device}" ]; then
/usr/sbin/blkid "${device}" &> /dev/null
if [ $? == 2 ]; then
echo "secondary device found ${device}"
echo "creating filesystem for containers mount"
if ! mkfs.xfs -L splitdisk -f "${device}" &> /dev/null; then
echo "Failed to create filesystem on ${device}" >&2
exit 1
fi
udevadm settle
touch /etc/var-lib-split-disk-mount
exit 0
fi
fi
done
fi
# Device not found yet, wait before retry
if [ $attempt -lt $MAX_RETRIES ]; then
echo "Device not found, waiting ${RETRY_DELAY}s before retry..."
sleep $RETRY_DELAY
fi
done
echo "Couldn't find secondary block device after ${MAX_RETRIES} attempts!" >&2
exit 77var-lib-images.mount:
- Mounts
/dev/disk/by-label/splitdiskto/var/lib/images - Uses XFS filesystem
- Waits for device detection to complete
selinux-splitdisk-policy.service:
- Sets SELinux file context equivalence to
/var/lib/containers - Command:
semanage fcontext -a -e /var/lib/containers /var/lib/images
```bash
/usr/sbin/semanage fcontext -a -e /var/lib/containers /var/lib/images
```restorecon-var-lib-splitdisk.service:
- Restores SELinux contexts recursively
- Command:
restorecon -R /var/lib/images - Runs before CRI-O starts
```bash
/sbin/restorecon -R /var/lib/images
```CRI-O service dependency:
- CRI-O waits for split disk mount and SELinux configuration
- Ensures split disk is ready before container operations begin
```ini
[Unit]
After=restorecon-var-lib-splitdisk.service
Requires=restorecon-var-lib-splitdisk.service var-lib-images.mount
```
Complete MachineConfig YAML:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: 98-config-split-disk
spec:
config:
ignition:
version: 3.2.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKCiMgRGV0ZWN0IGNsb3VkIHBsYXRmb3JtIC0gY2hlY2sgc3lzX3ZlbmRvciBmaXJzdCAobW9yZSByZWxpYWJsZSB0aGFuIHByb2R1Y3RfbmFtZSkKUExBVEZPUk09IiIKaWYgWyAtZiAvc3lzL2NsYXNzL2RtaS9pZC9zeXNfdmVuZG9yIF07IHRoZW4KICBWRU5ET1I9JChjYXQgL3N5cy9jbGFzcy9kbWkvaWQvc3lzX3ZlbmRvciAyPi9kZXYvbnVsbCB8fCBlY2hvICIiKQogIGlmIFtbICIkVkVORE9SIiA9PSAqIkdvb2dsZSIqIF1dOyB0aGVuCiAgICBQTEFURk9STT0iZ2NwIgogIGVsaWYgW1sgIiRWRU5ET1IiID09ICoiQW1hem9uIiogXV0gfHwgW1sgIiRWRU5ET1IiID09ICoiRUMyIiogXV07IHRoZW4KICAgIFBMQVRGT1JNPSJhd3MiCiAgZmkKZmkKCiMgRmFsbGJhY2sgdG8gcHJvZHVjdF9uYW1lIGlmIHN5c192ZW5kb3IgZGlkbid0IHdvcmsKaWYgWyAteiAiJFBMQVRGT1JNIiBdICYmIFsgLWYgL3N5cy9jbGFzcy9kbWkvaWQvcHJvZHVjdF9uYW1lIF07IHRoZW4KICBQUk9EVUNUPSQoY2F0IC9zeXMvY2xhc3MvZG1pL2lkL3Byb2R1Y3RfbmFtZSAyPi9kZXYvbnVsbCB8fCBlY2hvICIiKQogIGlmIFtbICIkUFJPRFVDVCIgPT0gKiJHb29nbGUiKiBdXTsgdGhlbgogICAgUExBVEZPUk09ImdjcCIKICBlbGlmIFtbICIkUFJPRFVDVCIgPT0gKiJBbWF6b24iKiBdXTsgdGhlbgogICAgUExBVEZPUk09ImF3cyIKICBmaQpmaQoKIyBJZiBwbGF0Zm9ybSBkZXRlY3Rpb24gZmFpbGVkLCB0cnkgbWV0YWRhdGEgc2VydmljZXMKaWYgWyAteiAiJFBMQVRGT1JNIiBdOyB0aGVuCiAgaWYgY3VybCAtcyAtZiAtbSAxIGh0dHA6Ly8xNjkuMjU0LjE2OS4yNTQvY29tcHV0ZU1ldGFkYXRhL3YxLyAtSCAiTWV0YWRhdGEtRmxhdm9yOiBHb29nbGUiICY+L2Rldi9udWxsOyB0aGVuCiAgICBQTEFURk9STT0iZ2NwIgogIGVsaWYgY3VybCAtcyAtZiAtbSAxIGh0dHA6Ly8xNjkuMjU0LjE2OS4yNTQvbGF0ZXN0L21ldGEtZGF0YS8gJj4vZGV2L251bGw7IHRoZW4KICAgIFBMQVRGT1JNPSJhd3MiCiAgZmkKZmkKCmVjaG8gIkRldGVjdGVkIHBsYXRmb3JtOiAke1BMQVRGT1JNfSIKCiMgUmV0cnkgdXAgdG8gMzAgdGltZXMgd2l0aCAyIHNlY29uZCBkZWxheSAoNjAgc2Vjb25kcyB0b3RhbCkKTUFYX1JFVFJJRVM9MzAKUkVUUllfREVMQVk9MgoKZm9yIGF0dGVtcHQgaW4gJChzZXEgMSAkTUFYX1JFVFJJRVMpOyBkbwogIGVjaG8gIkF0dGVtcHQgJGF0dGVtcHQgb2YgJE1BWF9SRVRSSUVTIHRvIGZpbmQgc2Vjb25kYXJ5IGRldmljZSIKICAKICAjIFNlYXJjaCBmb3Igc2Vjb25kYXJ5IGRldmljZSBiYXNlZCBvbiBwbGF0Zm9ybQogIGlmIFsgIiRQTEFURk9STSIgPT0gImF3cyIgXTsgdGhlbgogICAgIyBBV1MgdXNlcyBOVk1lIGRldmljZXMKICAgIGZvciBkZXZpY2UgaW4gL2Rldi9udm1lKlswLTldKm4qOyBkbwogICAgICAvdXNyL3NiaW4vYmxraWQgIiR7ZGV2aWNlfSIgJj4gL2Rldi9udWxsCiAgICAgIGlmIFsgJD8gPT0gMiBdOyB0aGVuCiAgICAgICAgZWNobyAic2Vjb25kYXJ5IGRldmljZSBmb3VuZCAke2RldmljZX0iCiAgICAgICAgZWNobyAiY3JlYXRpbmcgZmlsZXN5c3RlbSBmb3IgY29udGFpbmVycyBtb3VudCIKICAgICAgICBpZiAhIG1rZnMueGZzIC1MIHNwbGl0ZGlzayAtZiAiJHtkZXZpY2V9IiAmPiAvZGV2L251bGw7IHRoZW4KICAgICAgICAgIGVjaG8gIkZhaWxlZCB0byBjcmVhdGUgZmlsZXN5c3RlbSBvbiAke2RldmljZX0iID4mMgogICAgICAgICAgZXhpdCAxCiAgICAgICAgZmkKICAgICAgICB1ZGV2YWRtIHNldHRsZQogICAgICAgIHRvdWNoIC9ldGMvdmFyLWxpYi1zcGxpdC1kaXNrLW1vdW50CiAgICAgICAgZXhpdCAwCiAgICAgIGZpCiAgICBkb25lCiAgZWxpZiBbICIkUExBVEZPUk0iID09ICJnY3AiIF07IHRoZW4KICAgICMgR0NQIHVzZXMgU0NTSSBkZXZpY2VzIChza2lwIHNkYSB3aGljaCBpcyBib290IGRpc2spCiAgICBmb3IgZGV2aWNlIGluIC9kZXYvc2RbYi16XTsgZG8KICAgICAgaWYgWyAtYiAiJHtkZXZpY2V9IiBdOyB0aGVuCiAgICAgICAgL3Vzci9zYmluL2Jsa2lkICIke2RldmljZX0iICY+IC9kZXYvbnVsbAogICAgICAgIGlmIFsgJD8gPT0gMiBdOyB0aGVuCiAgICAgICAgICBlY2hvICJzZWNvbmRhcnkgZGV2aWNlIGZvdW5kICR7ZGV2aWNlfSIKICAgICAgICAgIGVjaG8gImNyZWF0aW5nIGZpbGVzeXN0ZW0gZm9yIGNvbnRhaW5lcnMgbW91bnQiCiAgICAgICAgICBpZiAhIG1rZnMueGZzIC1MIHNwbGl0ZGlzayAtZiAiJHtkZXZpY2V9IiAmPiAvZGV2L251bGw7IHRoZW4KICAgICAgICAgICAgZWNobyAiRmFpbGVkIHRvIGNyZWF0ZSBmaWxlc3lzdGVtIG9uICR7ZGV2aWNlfSIgPiYyCiAgICAgICAgICAgIGV4aXQgMQogICAgICAgICAgZmkKICAgICAgICAgIHVkZXZhZG0gc2V0dGxlCiAgICAgICAgICB0b3VjaCAvZXRjL3Zhci1saWItc3BsaXQtZGlzay1tb3VudAogICAgICAgICAgZXhpdCAwCiAgICAgICAgZmkKICAgICAgZmkKICAgIGRvbmUKICBmaQogIAogICMgRGV2aWNlIG5vdCBmb3VuZCB5ZXQsIHdhaXQgYmVmb3JlIHJldHJ5CiAgaWYgWyAkYXR0ZW1wdCAtbHQgJE1BWF9SRVRSSUVTIF07IHRoZW4KICAgIGVjaG8gIkRldmljZSBub3QgZm91bmQsIHdhaXRpbmcgJHtSRVRSWV9ERUxBWX1zIGJlZm9yZSByZXRyeS4uLiIKICAgIHNsZWVwICRSRVRSWV9ERUxBWQogIGZpCmRvbmUKCmVjaG8gIkNvdWxkbid0IGZpbmQgc2Vjb25kYXJ5IGJsb2NrIGRldmljZSBhZnRlciAke01BWF9SRVRSSUVTfSBhdHRlbXB0cyEiID4mMgpleGl0IDc3Cg==
mode: 0755
path: /etc/find-secondary-device
overwrite: true
- contents:
source: data:text/plain;charset=utf-8;base64,W2NyaW9dCmltYWdlc3RvcmUgPSAiL3Zhci9saWIvaW1hZ2VzIg==
mode: 0644
path: /etc/crio/crio.conf.d/99-imagestore.conf
overwrite: true
- contents:
source: data:text/plain;charset=utf-8;base64,W1VuaXRdCkFmdGVyPXJlc3RvcmVjb24tdmFyLWxpYi1zcGxpdGRpc2suc2VydmljZQpSZXF1aXJlcz1yZXN0b3JlY29uLXZhci1saWItc3BsaXRkaXNrLnNlcnZpY2UgdmFyLWxpYi1pbWFnZXMubW91bnQK
mode: 0644
path: /etc/systemd/system/crio.service.d/99-wait-for-splitdisk.conf
overwrite: true
systemd:
units:
- name: find-secondary-device.service
enabled: true
contents: |
[Unit]
Description=Find secondary device
DefaultDependencies=false
After=systemd-udev-settle.service
Before=local-fs-pre.target
ConditionPathExists=!/etc/var-lib-split-disk-mount
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/etc/find-secondary-device
[Install]
WantedBy=multi-user.target
- name: var-lib-images.mount
enabled: true
contents: |
[Unit]
Description=Mount /var/lib/images
Requires=find-secondary-device.service
After=find-secondary-device.service
Before=local-fs.target
[Mount]
What=/dev/disk/by-label/splitdisk
Where=/var/lib/images
Type=xfs
Options=defaults
TimeoutSec=120s
[Install]
WantedBy=local-fs.target
- name: selinux-splitdisk-policy.service
enabled: true
contents: |
[Unit]
Description=Set SELinux file context rules for splitdisk
DefaultDependencies=no
After=var-lib-images.mount
Before=restorecon-var-lib-splitdisk.service
ConditionPathExists=!/var/lib/splitdisk-selinux-configured
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c '/usr/sbin/semanage fcontext -a -e /var/lib/containers /var/lib/images && touch /var/lib/splitdisk-selinux-configured'
TimeoutSec=0
[Install]
WantedBy=multi-user.target
- name: restorecon-var-lib-splitdisk.service
enabled: true
contents: |
[Unit]
Description=Restore recursive SELinux security contexts
DefaultDependencies=no
After=selinux-splitdisk-policy.service var-lib-images.mount
Requires=selinux-splitdisk-policy.service
Before=crio.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/restorecon -R /var/lib/images
TimeoutSec=0
[Install]
WantedBy=multi-user.target graphical.targetVerifying split disk configuration
After cluster creation completes, verify that the split disk is configured correctly.
Verify all worker nodes are ready:
oc get nodesCheck the MachineConfig applied:
oc get machineconfig 98-config-split-disk
oc get machineconfigpool workerThe worker pool should show the updated configuration.
Then verify the split disk with a test workload. Deploy a pod with a large image to confirm split disk is working:
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: pytorch-interactive
labels:
app: pytorch
spec:
containers:
- name: pytorch-container
image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
command: ["sleep", "infinity"]
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
EOF
# Wait for the image to pull (may take several minutes)
oc get pod pytorch-interactive -wWhen the pod status shows Running, verify the image location:
# Get the worker node where the pod is running
NODE=$(oc get pod pytorch-interactive -o jsonpath='{.spec.nodeName}')
# Start a debug session on the node
oc debug node/${NODE}
# Inside the debug shell, switch to the host filesystem
chroot /host
# Verify split disk directories exist
ls -la /var/lib/images/
# Expected: overlay/, overlay-images/, overlay-layers/
# Check for the PyTorch image on the split disk
ls -lh /var/lib/images/overlay-images/
# Check disk for the image
crictl ps
# record the pytorch-container image id you can verify if its present in images folder
ls /var/lib/images/overlay-images
# Expected: Several GB (e.g., 5.2G)
du -sh /var/lib/containers/storage/
# Expected: Smaller size, unchanged (e.g., 1.1G)
# View filesystem mount
df -h | grep splitdisk
# Expected: Shows split disk mounted at /var/lib/images
# Exit the debug shell
exit
exit
# Clean up the test pod
oc delete pod pytorch-interactivePyTorch image (several GB) appears in /var/lib/images/overlay-images/. Split disk shows increased usage after the image pull and pre-baked AMI images remain unchanged in /var/lib/containers/storage/.
This confirms that newly pulled images are stored on the split disk while existing AMI images remain on the original disk.
Troubleshooting
If the secondary device is not found, check the device detection service:
oc debug node/<worker-node-name>
chroot /host
journalctl -u find-secondary-device.serviceVerify the secondary volume is attached.
AWS:
lsblk
# Look for the secondary NVMe device
GCP:
lsblk
# Look for /dev/sdb or other SCSI deviceIf the split disk is not mounted, check the mount status:
systemctl status var-lib-images.mount
journalctl -u var-lib-images.mountVerify the device has the correct label:
blkid | grep splitdiskFor SELinux issues, verify SELinux contexts:
ls -laZ /var/lib/images/
semanage fcontext -l | grep splitdiskCheck the SELinux services:
systemctl status selinux-splitdisk-policy.service
systemctl status restorecon-var-lib-splitdisk.serviceIf CRI-O is not using imagestore, check the CRI-O configuration:
cat /etc/crio/crio.conf.d/99-imagestore.conf
# Should contain the configuration for imageStoreCheck the CRI-O service:
systemctl status crio
journalctl -u crio | grep imagestoreFinal thoughts
Split disk gives you a practical way to handle large container images on OpenShift without overrunning your boot disk. I've seen it make the difference between clusters that struggle with disk pressure and ones that scale smoothly with image-intensive workloads. This configuration process is straightforward when you understand the function of each piece. You modify your machinesets to provision secondary disks, apply a MachineConfig that detects and mounts those disks, configure SELinux appropriately, and tell CRI-O where to store images. The orchestration happens automatically through systemd; and once it's in place, it just works.
If you're running AI/ML workloads or any scenario where image sizes are measured in gigabytes rather than megabytes, split disk is worth considering. Start with a test cluster, verify the configuration works for your specific images and workflows, then roll it out to production with confidence.