Faster AI/ML container startup with additional storage in Red Hat OpenShift 4.22

When running artificial intelligence and machine learning (AI/ML) workloads on Red Hat OpenShift, container startup time matters. Image pull operations can account for the majority of container startup time, and with multi-gigabyte model images becoming the norm, that translates to minutes of waiting before a container can serve its first request. Large OCI artifacts like machine learning models are also forced onto the same root file system as everything else, consuming space and preventing the use of faster dedicated storage.

OpenShift 4.22 introduces a new feature (in Technology Preview) that gives cluster administrators control over where CRI-O stores and retrieves container image layers, OCI artifacts, and complete container images. Whether you need dedicated solid-state drive (SSD) storage, shared image caches, or lazy pulling, the new ContainerRuntimeConfig application programming interface (API) fields let you configure it declaratively.

The problem: one storage location for everything

Tools like the Kubernetes Image Puller help by pre-pulling images onto nodes before workloads need them. But for large AI/ML images, this approach has limits: it duplicates every image on every node regardless of whether that node runs the workload, which can cause storage pressure and pod evictions. And pulled images still land on the root file system, so the placement problem remains.

By default, CRI-O stores all container data under a single root directory (/var/lib/containers/storage). This works well for typical workloads, but creates problems for AI/ML scenarios:

Large artifacts on the wrong disk: A 15 GB ML model stored on the root file system competes for space with the OS and other containers. Dedicated SSD storage would be faster and wouldn't risk filling up the root partition.
Redundant pulls across nodes: When 50 nodes in a cluster all pull the same 10 GB base image from an external registry, that's 500 GB of network traffic that could be avoided with a shared read-only cache.
Full download before startup: Containers can't start until 100% of the image is pulled. For large images, this delays application availability and slows autoscaling response times.
No offline artifact delivery: In air-gapped or edge environments, there's no way to pre-populate artifacts on nodes from external media without pulling from a registry.

Three new storage configuration options

A new Technology Preview feature gate adds three fields to the ContainerRuntimeConfig API. Each addresses a different storage scenario: Additional artifact stores, additional image stores, and additional layer stores.

Additional artifact stores

Specify additional read-only locations where CRI-O resolves OCI artifacts, such as ML models pulled as OCI volume images. CRI-O checks them in order before falling back to the default location.

This is the primary use case for Red Hat OpenShift AI teams who want large models on SSD-backed storage, separate from the root file system.

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: ssd-artifact-stores
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
  containerRuntimeConfig:
    additionalArtifactStores:
      - path: /mnt/ssd-artifacts
      - path: /mnt/nfs-shared-artifacts

Pre-populate the artifact store using Podman on each node (or on a shared Network File System (NFS) mount), then apply the ContainerRuntimeConfig. CRI-O finds the artifacts locally and skips the download.

Additional image stores

Specify read-only container image caches on shared or high-performance storage. This builds on a proven, stable pattern already used in the upstream container storage libraries.

When CRI-O needs an image, it checks the additional stores first. If the image exists there, no registry pull happens.

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: shared-image-cache
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
  containerRuntimeConfig:
    additionalImageStores:
      - path: /mnt/nfs-image-cache
      - path: /mnt/ssd-images

A typical setup: mount an NFS share with pre-populated images across all worker nodes. Nodes read from the shared cache instead of pulling from an external registry. This is especially useful in air-gapped environments or when many nodes run the same workloads.

Additional layer stores

Enable lazy image pulling through storage plug-ins like stargz-store. With lazy pulling, containers start after downloading only the required file chunks. The rest is fetched on demand during runtime.

This is the most impactful option for container startup time with large images, but it requires some setup:

You install the storage plug-in binary (such as stargz-store) on each node.
You convert your images to eStargz format.
You use a container registry that supports HTTP range requests. Most major registries do, including Docker Hub, Quay, GitHub Container Registry, Amazon Elastic Container Registry (Amazon ECR), Harbor, and Google Artifact Registry.

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: lazy-pulling
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
  containerRuntimeConfig:
    additionalLayerStores:
      - path: /var/lib/stargz-store

When generating storage.conf, the MCO automatically appends the :ref suffix to each layer store path. This suffix tells the container storage library to use reference-based resolution, which is required for lazy pulling plug-ins.

How it works under the hood

When you create a ContainerRuntimeConfig with any of these fields, the machine config operator (MCO) translates the API configuration into the appropriate config files:

Layer stores and image stores: Written into /etc/containers/storage.conf on the target nodes.
Artifact stores: Written into a CRI-O drop-in file at /etc/crio/crio.conf.d/01-ctrcfg-additionalArtifactStores.

MCO generates a MachineConfig for the matching node pool. Nodes reboot to apply the new configuration. After a reboot, CRI-O reads the updated config and begins resolving storage from the additional locations.

Path validation is enforced at the API level, so invalid paths are rejected before they reach the node.

If an additional store path doesn't exist or is inaccessible at runtime, CRI-O logs a warning and continues with the remaining stores. The default storage location always works as a fallback.

Limits and constraints

All three fields are gated behind the AdditionalStorageConfig feature gate, which requires TechPreviewNoUpgrade. As with all Technology Preview features in OpenShift, this is intended for evaluation and testing. Clusters with TechPreviewNoUpgrade enabled follow a separate lifecycle and are not eligible for minor version upgrades.

additionalLayerStores
- Max entries: 5
- Config target: storage.conf
additionalImageStores
- Max entries: 10
- Config target: storage.conf
additionalArtifactStores
- Max entries: 10
- Config target: CRI-O drop-in

You can use all three storage types together in a single ContainerRuntimeConfig, or configure them independently depending on your needs.

Getting started

To get started, enable the TechPreviewNoUpgrade feature gate on your cluster. Then follow these steps:

Prepare the storage on your nodes:
- For artifact stores: Mount the target directory and pre-populate it with artifacts using Podman.
- For image stores: Mount a shared file system (such as NFS) with pre-populated container images.
- For layer stores: Install a storage plug-in binary (such as stargz-store) on each node.
Create a ContainerRuntimeConfig with the appropriate fields for your use case.
Wait for the MCO rollout to complete. Nodes reboot with the new configuration.

Verify your new configuration by opening a debug session to a node, and check the generated config files:

$ oc debug node/<node-name>
sh-5.1# chroot /host
sh-5.1# cat /etc/containers/storage.conf                                  # layer and image stores
sh-5.1# cat /etc/crio/crio.conf.d/01-ctrcfg-additionalArtifactStores      # artifact stores

Built on upstream contributions

This feature builds on work across several upstream projects:

CRI-O gained support for additional artifact store configuration, following the same pattern used for shared image caches in the container storage libraries.
The additional layer store plug-in API in the upstream container storage libraries enables lazy pulling.
The full design covering all three storage types is captured in an OpenShift enhancement proposal.

What's next

This is a Technology Preview in OpenShift 4.22. Based on feedback and validation results, we plan to:

Work with the Red Hat OpenShift AI team to validate performance improvements with SSD-backed artifact storage for large ML models.
Stabilize the Additional Layer Store API in the upstream container storage libraries (currently experimental).
Gather feedback on the current model where customers install storage plug-in binaries themselves, and evaluate whether to provide supported plug-in images.
Move toward general availability (GA) in a future OpenShift release.

We want to hear from you. If you're running AI/ML workloads on OpenShift and waiting for large images or artifacts to download, try the new storage configuration options and share your results: what storage setup you chose, what startup time improvements you measured, and what would make the feature more useful for your environment.

For complete details, see the OpenShift 4.22 documentation on additional storage configuration. Send us feedback through your Red Hat contacts, or create an issue on GitHub.

Faster AI/ML container startup with additional storage in Red Hat OpenShift 4.22

Optimize container startup time with OpenShift 4.22's new storage features

The problem: one storage location for everything

Three new storage configuration options

Additional artifact stores

Additional image stores

Additional layer stores

How it works under the hood

Limits and constraints

Getting started

Built on upstream contributions

What's next

Learn more

New features in Python 3.14

Why killing pods is not enough: Testing operator reconciliation with operator-chaos

Troubleshoot Red Hat OpenShift Virtualization localnet with the netobserv command

EvalHub: Capability and safety benchmarking for AI models

Tune and troubleshoot Red Hat Data Grid cross-site replication

The Grumpy Developer's Guide to OpenShift

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links