Debugging image mode with Red Hat OpenShift 4.20: A practical guide

Image mode is a powerful capability in Red Hat OpenShift that allows customization to CoreOS-based nodes. While this feature provides unprecedented flexibility, it can also introduce new layers of complexity when things go wrong. This guide demonstrates common debugging scenarios in OpenShift 4.20 and beyond, and provides practical troubleshooting steps to get clusters back on track. There are many tools and techniques that make debugging image mode less mysterious and more manageable.

Understanding the image mode process

When image mode for OpenShift is enabled, the typical workflow involves three stages. Each stage has distinct failure points. The stages are: MachineOSConfig (MOSC) creation, MachineOSBuild (MOSB) creation and execution, and application of the new image to nodes.

Stage 1: MachineOSConfig creation

The process begins when a MachineOSConfig resource is created targeting a specific MachineConfigPool. This resource acts as the blueprint, defining how the custom OS image is built and where it gets stored.

What to watch for:

Validation errors: Resource creation may fail when required fields are missing or incorrectly configured.
Secret references: Ensure all referenced pull and push secrets exist in the openshift-machine-config-operator namespace.
Registry specifications: Verify that renderedImagePushSpec points to a valid, accessible registry location.

At this stage, issues are typically configuration errors that prevent a resource from being created or accepted by the cluster.

If the MOSC resource was successfully created, then the machine-os-builder pod should be healthy and running in the openshift-machine-config-operator namespace.

$ oc get pods -n openshift-machine-config-operator \
-l k8s-app=machine-os-builder
NAME                         READY  STATUS  RESTARTS
machine-os-builder-b8f..h94   1/1   Running   0

If this pod is visible and running, your debugging can proceed to the next step, which builds the image.

If this pod is not visible, then errors have occurred.

If a forbidden value is used for any of the fields in MachineOSConfig, it's printed in the create command output. For example, given this YAML file:

$ cat ./mosc.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: infra
  currentImagePullSecret:
    name: current-image-pull
  imageBuilder:
    imageBuilderType: Job
  baseImagePullSecret:
    name: base-image-pull
  renderedImagePushSecret:
    name: rendered-image
  renderedImagePushSpec: "quay.io/sregidor/sregidor-os:mco_layering"

The output of oc create is:

$ oc create -f ./mosc.yaml
The MachineOSConfig "worker" is invalid: 
* spec.imageBuilder.imageBuilderType: Unsupported value: "job": supported values: "Job"

The example shows that spec.imageBuilder.imageBuilderType is set to job instead of the required Job (with a capital "J").

Another example:

$ oc create -f ./mosc.yaml
The MachineOSConfig "worker" is invalid: <nil>: Invalid value: "object": MachineOSConfig name must match the referenced MachineConfigPool name; can only have one MachineOSConfig per MachineConfigPool

If the configured values are not forbidden but nevertheless are causing problems, the information to detect those problems is in the openshift-machine-config-operator pod, in the triggered events, and in the machine-config ClusterOperator. The most detailed information is in the openshift-machine-config-operator pod.

For example, supposed the secrets haven't been created for this sample YAML configuration:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
  name: infra
spec:
  machineConfigPool:
    name: infra
  currentImagePullSecret:
    name: current-image-pull
  imageBuilder:
    imageBuilderType: Job
  baseImagePullSecret:
    name: base-image-pull
  renderedImagePushSecret:
    name: rendered-image
  renderedImagePushSpec: "quay.io/sregidor/sregidor-os:mco_layering"

Nevertheless, the resource can be created:

$ oc create -f ./mosc.yaml
machineosconfig.machineconfiguration.openshift.io/infra created

However, the builder pod is not created:

$ oc get pods -n openshift-machine-config-operator |grep build

The error can be found in the openshift-machine-config-operator pod:

$ oc logs -n openshift-machine-config-operator-7498f4576b-h5vzj 
...
E1017 08:56:53.431756       1 operator.go:467] "Unhandled Error" err="could not update Machine OS Builder deployment: could not validate renderedImagePushSecret \"rendered-image\" for MachineOSConfig infra: secret rendered-image from infra is not found. Did you use the right secret name?"
...

There are also events reporting the error:

$ oc get events  -n openshift-machine-config-operator --sort-by metadata.creationTimestamp  |tail -3
34s         Warning   OperatorDegraded: MachineOSBuilderFailed   /machine-config                                                      Failed to resync 4.20.0-0-2025-10-16-080835-test-ci-ln-bfn63jk-latest because: could not update Machine OS Builder deployment: could not validate renderedImagePushSecret "rendered-image" for MachineOSConfig infra: secret rendered-image from infra is not found. Did you use the right secret name?
11s         Warning   OperatorDegraded: MachineOSBuilderFailed   /machine-config                                                      Failed to resync 4.20.0-0-2025-10-16-080835-test-ci-ln-bfn63jk-latest because: could not update Machine OS Builder deployment: could not validate baseImagePullSecret "base-image-pull" for MachineOSConfig infra: secret base-image-pull from infra is not found. Did you use the right secret name?
96s         Normal    ConfigMapUpdated                           deployment/openshift-machine-config-operator                                   Updated ConfigMap/kube-rbac-proxy -n openshift-machine-config-operator:...

You can get information from the machine-config ClusterOperator, too:

$ oc get co machine-config
NAME             VERSION                                                AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.20.0-0-2025-10-16-080835-test-ci-ln-bfn63jk-latest   True        False         True       76m     Failed to resync 4.20.0-0-2025-10-16-080835-test-ci-ln-bfn63jk-latest because: could not update Machine OS Builder deployment: could not validate renderedImagePushSecret "rendered-image" for MachineOSConfig infra: secret rendered-image from infra is not found. Did you use the right secret name?

Stage 2: MachineOSBuild (MOSB) creation and image build process

After the MachineOSConfig has been successfully created, and the machine-os-builder pod is running, openshift-machine-config-operator automatically generates a MachineOSBuild resource. The MachineOSBuild resource controls an actual image build job that pulls the base CoreOS image, applies the customizations (in a Containerfile), and pushes the result to the specified registry.

To execute this process, several auxiliary secrets and configmaps are created in the openshift-machine-config-operator namespace.

What to watch for:

Build status: Monitor the MachineOSBuild resource for conditions showing Succeeded=True or Failed=True.
Job failures: Verify that the build job in the openshift-machine-config-operator namespace completes successfully.
Image pull errors: Authentication failures when pulling the base image indicate problems with baseImagePullSecret.
Build errors: Containerfile syntax issues, missing packages, or failed RUN commands cause build failures.
Image push errors: A problem pushing to the registry suggests an issue with renderedImagePushSecret or registry permissions.

This is where most failures occur, because it involves pulling images, executing build steps, and pushing results, all of which depend on external resources and credentials.

Useful output is displayed while the image is being built. While acquiring the MachineOSBuild resource:

$ oc -n openshift-machine-config-operator get machineosbuild
NAME                                     PREPARED   BUILDING   SUCCEEDED
infra-b1b93a87b88b18b3ad70e9fb2596b2cd   False      True       False
INTERRUPTED   FAILED   AGE
False         False    108s

Creating the job in the MachineOSBuild execution:

$ oc -n openshift-machine-config-operator get job
NAME                         STATUS   COMPLETIONS  DURATION  AGE
build-infra-b1b93a87b..b2cd  Running  0/1          105s      105s

The pod controlled by the job, which executes the actual build process:

$ oc -n openshift-machine-config-operator get pods
NAME                               READY STATUS    RESTARTS  AGE
build-infra-b1b93a87b..b2cd-q7tsb  0/1   Init:0/1  0         2m49s
...

Note that only changes to kernel arguments, kernel type, OSImageURL, or extension bundles create a new job and trigger a new image build process. All other MachineConfig changes reuse the existing MOSB and do not trigger a new build.

This stage can be considered successful if:

The MachineOSBuild was created and is reporting Succeeded=True and Failed=False
The job is automatically removed by the machine-os-builder pod

$ oc -n openshift-machine-config-operator get machineosbuild
NAME                 PREPARED  BUILDING SUCCEEDED INTERRUPTED FAILED
infra-f509ba5..e99   False     False    True      False       False

When the MachineOSBuild resource is not created

When MachineOSBuild is not created, or is not successful, it indicates that an error has occurred. The process in charge of creating the MachineOSBuild resource is the machine-os-builder pod. This error is not very common, but if it happens, you must read the logs in this pod to find the causes:

  $ oc -n openshift-machine-config-operator logs machine-os-builder-b8f48488f-nsdbk
  ....
  I1017 09:42:42.524084       1 reconciler.go:634] New MachineOSBuild created: infra-f509ba5b2d76bcc5a113fd81de75ee99

When the MachineOSBuild was created, but failed

The most common cause of a failed MachineOSBuild is that the job building the image failed to build it. If the MachineOSBuild resource fails, the first step is to locate the associated job.

When the job is not created

The machine-os-builder pod is in charge of creating or deleting a job. If the job cannot be found, read the logs in this pod for further information:

  $ oc -n openshift-machine-config-operator logs machine-os-builder-b8f48488f-nsdbk

Debugging a failed job

Debugging a failed job can take many forms, depending on the problem. Focusing on one problem at a time helps you confirm your theory about the cause of the problem.

For example, suppose the following MOSC is created:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineOSConfig
metadata:
  name: infra
spec:
  machineConfigPool:
    name: infra
  currentImagePullSecret:
    name: current-image-pull
  imageBuilder:
    imageBuilderType: Job
  baseImagePullSecret:
    name: base-image-pull
  renderedImagePushSecret:
    name: rendered-image
  renderedImagePushSpec: "quay.io/sregidor/sregidor-os:mco_layering"
  containerFile:
      - content: |-
          RUN curl --fail -L https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong -o /usr/bin/yq && chmod +x /usr/bin/yq

You run the oc create command:

$ oc create -f mosc.yaml
machineosconfig.machineconfiguration.openshift.io/infra created

But the MachineConfigPool shows as degraded:

$ oc get mcp infra
NAME   CONFIG                  UPDATED UPDATING DEGRADED M..COUNT READYMACHINECOUNT
infra  rendered-infra-620..a43 False   False    True     1      0
UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
0                     0                      158m
$ oc get mcp infra -oyaml
...
  - lastTransitionTime: "2025-10-17T10:55:00Z"
    message: 'Failed to build OS image for pool infra (MachineOSBuild: infra-32ef35dea3e553071277954842edb33a):
      Failed: Build Failed'
    reason: BuildFailed
    status: "True"
    type: ImageBuildDegraded
...

The MOSB resource shows as failing:

$ oc -n openshift-machine-config-operator get machineosbuild
NAME           PREPARED BUILDING SUCCEEDED INTERRUPTED FAILED AGE
infra-32e..33a False    False    False     False       True   31m

So you locate the job:

$ oc get job -l machineconfiguration.openshift.io/machine-os-config=infra
NAME                 STATUS COMPLETION DURATION AGE
build-infra-32e..33a Failed 0/1        31m      31m

These are the pods launched by the failed job:

$ oc -n openshift-machine-config-operator get pods
NAME                       READY STATUS     RESTARTS AGE
build-infra-32e..33a-2jg2t  0/1  Init:Error 0        25m
build-infra-32e..33a-bzfcp  0/1  Init:Error 0        29m
build-infra-32e..33a-cndjm  0/1  Init:Error 0        32m
build-infra-32e..33a-lqlk9  0/1  Init:Error 0        22m

Examine the logs of the failed pod to determine the cause. The build pods have two containers: image-build and create-digest-configmap.

The container image-build builds the image and pushes it
The container create-digest-configmap creates an auxiliary configmap with the right digest so that it can be read and openshift-machine-config-operator can update the MOSB and MOSC resources

To identify errors in the build process, examine the image-build container in the build pod:

$ oc -n openshift-machine-config-operator logs \
build-infra-32ef35dea3e553071277954842edb33a-2jg2t \
-c image-build
...
time="2025-10-17T10:51:32Z" level=debug msg="Running &exec.Cmd{Path:\"/bin/sh\", Args:[]string{\"/bin/sh\", \"-c\", \"curl --fail -L https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong -o /usr/bin/yq && chmod +x /usr/bin/yq\"}, Env:[]string{\"HTTP_PROXY=\", \"HTTPS_PROXY=\", \"NO_PROXY=\", \"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\", \"HOSTNAME=0430829320a1\", \"HOME=/root\"}, Dir:\"/\", Stdin:(*os.File)(0xc0001280a0), Stdout:(*os.File)(0xc0001280a8), Stderr:(*os.File)(0xc0001280b0), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(0xc00017c0c0), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), Err:error(nil), Cancel:(func() error)(nil), WaitDelay:0, childIOFiles:[]io.Closer(nil), parentIOPipes:[]io.Closer(nil), goroutine:[]func() error(nil), goroutineErr:(<-chan error)(nil), ctxResult:(<-chan exec.ctxResult)(nil), createdByStack:[]uint8(nil), lookPathErr:error(nil), cachedLookExtensions:struct { in string; out string }{in:\"\", out:\"\"}} (PATH = \"\")"
%Total %Rec %Xfer Avg Speed Time Time Time Current Dload  Upload...    Speed
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
0     9    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: (22) The requested URL returned error: 404
subprocess exited with status 22
subprocess exited with status 22
time="2025-10-17T10:51:32Z" level=debug msg="Error building at step {Env:[HTTP_PROXY= HTTPS_PROXY= NO_PROXY= PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin] Command:run Args:[curl --fail -L https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong -o /usr/bin/yq && chmod +x /usr/bin/yq] Flags:[] Attrs:map[] Message:RUN curl --fail -L https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong -o /usr/bin/yq && chmod +x /usr/bin/yq Heredocs:[] Original:RUN curl --fail -L https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong -o /usr/bin/yq && chmod +x /usr/bin/yq}: exit status 22"
Error: building at STEP "RUN curl --fail -L https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong -o /usr/bin/yq && chmod +x /usr/bin/yq": exit status 22

The logs show that curl returned curl: (22) The requested URL returned error: 404 when attempting to reach https://github.com/example/yq/releases/latest/download/yq_linux_amd64_wrong. This happens because there is a typo in the URL and the actual URL should be https://github.com/example/yq/releases/latest/download/yq_linux_amd64.

After you find the error, you can edit the MOSC resource. In this example, using the correct URL in the Containerfile section triggers a new MOSB resource that successfully builds the image and applies the config.

Other kinds of errors are possible. For example, the lack of permissions to pull or push an image is relatively common in some environments. In this case, a pod reports that a configured secret doesn't have permission to push an image:

$ oc logs build-infra-5e0c7aaf3cf26e8fab9dd111bb336342-czzjb -c image-build
....
Copying blob sha256:29f46dbdbc11454d191cd70ebbd18aec36bc2afc72757d38f2ad473b6dba1c75
Copying blob sha256:d0a1fe72e3dceadb214f96787144ef31672f2b2a429a3798717d739a55a9b574
Error: pushing image "quay.io/sregidor/sregidor-os:infra-5e0c7aaf3cf26e8fab9dd111bb336342" to "docker://quay.io/sregidor/sregidor-os:infra-5e0c7aaf3cf26e8fab9dd111bb336342": writing blob: initiating layer upload to /v2/sregidor/sregidor-os/blobs/uploads/ in quay.io: unauthorized: access to the requested resource is not authorized

In this example, there was a problem in the build. If the build process is not failing but the build pod fails, then you can examine the create-digest-configmap container to see whether there was a problem creating the configmap with the digest info.

Auxiliary resources

To build the image, openshift-machine-config-operator uses several auxiliary resources temporarily stored in the openshift-machine-config-operator namespace. These resources are only present during the build process. However, if the build fails, they remain available for debugging purposes.

Those auxiliary resources are mounted in the build pod, so it can use them. Locate them using the oc get command:

$ oc get cm -n openshift-machine-config-operator \
--sort-by metadata.creationTimestamp
...
additionaltrustbundle-infra-32e..33a   1  47m
etc-policy-infra-32ef35dea3e553..33a   1  47m
mc-infra-32ef35dea3e55307127795..33a   1  47m
containerfile-infra-32ef35dea3e..33a   1  47m
etc-registries-infra-32ef35dea3e..33a  1  47m

$ oc get secret -n openshift-machine-config-operator \
--sort-by metadata.creationTimestamp
NAME                  TYPE                             DATA  AGE
...
global-pull-secret-copy kubernetes.io/dockerconfigjson  1  48m
final-infra-32e..33a    kubernetes.io/dockerconfigjson  1  48m
base-infra-32e..33a     kubernetes.io/dockerconfigjson  1  48m

The additional trust bundle configmap (in this example, additionaltrustbundle-infra-32e…33a) stores the necessary bundles to use Red Hat Enterprise Linux (RHEL) packages in the Containerfile. It must be taken from a copy of the etc-pki-entitlement secret in the openshift-config-managed namespace. If the build is having problems using RHEL packages, then ensure the resource is storing the correct bundles.

The current machine config configmap (mc-infra-32e…33a in this example) stores the MachineConfig resource that must be applied to the nodes in this MachineConfigPool. To see its content:

$ oc get cm -n openshift-machine-config-operator \
mc-infra-32ef35dea3e553071277954842edb33a \
-o jsonpath='{.data.machineconfig\.json\.gz}' | \
base64 -d | gunzip | jq | less

The container file configmap (containerfile-infra-32e…33a in this example) stores the full container file used to build the image. To see its contents:

$ oc get cm -o yaml \
containerfile-infra-32ef35dea3e553071277954842edb33a \
-o jsonpath='{.data.Containerfile}'

The etc registries and policies configmaps (etc-registries-infra-32e…33a and etc-policy-infra-32e…33a in this example) contain the registry configuration (registries.conf) and the policies (policy.json) used in the cluster so that they can be used in the build process as well. Look at those resources when there are problems with the container registries:

$ oc -n openshift-machine-config-operator get cm \
-o yaml etc-registries-infra-32ef35dea3e553071277954842edb33a
apiVersion: v1
data:
  registries.conf: |
    unqualified-search-registries = ['registry.access.r.com', 'docker.io']
...

The secrets are the ones configured in the MOSC resource. They contain the credentials to pull and push the necessary images.

If the MOSB fails, these auxiliary resources are not removed so that they can be used for further debugging.

Stage 3: Image applied to nodes

After a successful build, the openshift-machine-config-operator rolls out the new image updating the machineconfiguration.openshift.io/desiredImage annotation in the nodes and the MachineConfigDaemon pods apply the image.

What to watch for:

Pool update status: The MachineConfigPool show Updating=True as nodes begin updating
Image pull failures: Nodes may fail to download the image if currentImagePullSecret is incorrect
Network connectivity: Nodes must be able to reach the registry where the image is stored
Node degradation: Nodes stuck in degraded state due to failed updates should be checked
Reboot issues: Nodes should successfully reboot into the new OS image
Stalled updates: If the pool remains in the Updating state too long, investigate individual node statuses

In this final stage, issues typically relate to a node's ability to access and apply a layered image.

Success

The MCP should report an updated status:

$ oc get mcp infra
NAME   CONFIG                 UPDATED UPDATING DEGRADED MACHINECOUNT
infra  rendered-infra-f47..e74 True   False    False    3            
READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT  AGE
3                 3                   0                     61m

Verify the proper application of the image on the nodes:

$ oc debug -q node/ip-10-0-10-154.compute.example -- chroot \
/host rpm-ostree status
State: idle
Deployments:
ostree-unverified-registry:quay.io/sregidor/sregidor-os@sha256:876..ef13 
       Digest: sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
       Version: 9.6.20251013-1 (2025-10-17T12:09:08Z)
$ oc debug -q node/ip-10-0-10-154.compute.example -- chroot \
/host which yq /usr/bin/yq

$ oc debug -q node/ip-10-0-10-154.compute.example -- chroot /host yq -h
yq is a portable command-line data file processor (https://github.com/mikefarah/yq/) 
See https://mikefarah.gitbook.io/yq/ for detailed documentation and examples.
Usage:
  yq [flags]
  yq [command]
...

Error

At this point the debugging process is very similar to the one followed when applying a new MachineConfig. Focus on checking the MachineConfigPool status, the information in the MachineConfigNodes resources and the logs of the machine-config-daemon pods.

In case of error, the MCP shows as degraded:

$ oc get mcp infra
NAME  CONFIG                   UPDATED  UPDATING  DEGRADED  MACHINECOUNT
infra rendered-infra-620..a43   False   False     True      3
READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
0                 0                   1                    3h51m

$ oc get mcp infra -oyaml
...
  - lastTransitionTime: "2025-10-17T12:23:48Z"
    message: 'Node ip-10-0-75-69.compute.example is reporting: "Node ip-10-0-75-69.compute.example
      upgrade failure. Failed to update OS to quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
      after retries: timed out waiting for the condition", Node ip-10-0-75-69.compute.example
      is reporting: "Failed to update OS to quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
      after retries: timed out waiting for the condition"'
    reason: 1 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

And the detailed information can be found in the machine-config-daemon pod logs:

$ oc logs -n openshift-machine-config-operator $(oc get pods \
-n openshift-machine-config-operator -l "k8s-app=machine-config-daemon" \
--field-selector "spec.nodeName=ip-10-0-75-69.compute.example" \
-o jsonpath="{.items[0].metadata.name}") -c machine-config-daemon
...
I1017 12:26:52.042570    2750 update.go:2546] Updating OS to layered image "quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13"
I1017 12:26:52.042590    2750 image_manager_helper.go:92] Running captured: rpm-ostree --version
I1017 12:26:52.055729    2750 image_manager_helper.go:194] Linking rpm-ostree authfile to /etc/mco/internal-registry-pull-secret.json
I1017 12:26:52.055759    2750 rpm-ostree.go:183] Executing rebase to quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
I1017 12:26:52.055764    2750 update.go:2630] Running: rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
Pulling manifest: ostree-unverified-registry:quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
W1017 12:26:52.427068    2750 update.go:2591] Failed to update OS to quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13 (will retry): error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13: error: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reading manifest sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13 in quay.io/sregidor/sregidor-os: manifest unknown

Also verify information reported by the MachineConfigNode resources. This is especially important because in future versions of OpenShift, more information regarding the image mode process will be added to those resources in order to make the debugging process easier.

$ oc get machineconfignode -o wide
NAME                           POOLNAME  DESIREDCONFIG                                      CURRENTCONFIG                                      UPDATED   AGE     UPDATEPREPARED   UPDATEEXECUTED   UPDATEPOSTACTIONCOMPLETE   UPDATECOMPLETE   RESUMED   UPDATEDFILESANDOS   CORDONEDNODE   DRAINEDNODE   REBOOTEDNODE   UNCORDONEDNODE
ip-10-0-10-154.compute.example infra  rendered-infra-620..a43  rendered-infra-620..a43    True      4h34m   False            False           False                      False            False     False               False          False         False          False
ip-10-0-22-152.compute.example   master     rendered-master-93a022e91aa2bf815e4efed220ac97ea   rendered-master-93a022e91aa2bf815e4efed220ac97ea   True      4h44m   False            False            False                      False            False     False               False          False         False          False
ip-10-0-41-78.compute.example    infra      rendered-infra-620..a43
...
$ oc get machineconfignode ip-10-0-75-69.compute.example -o yaml
...
  - lastTransitionTime: "2025-10-17T12:22:13Z"
    message: 'Node ip-10-0-75-69.compute.example upgrade failure. Failed
      to update OS to quay.io/sregidor/sregidor-os@sha256:8761d4273f3213f2f9c9b4aa9dbe33aa758f17d691f0f53d2b20f55702c9ef13
      after retries: timed out waiting for the condition'
    reason: NodeDegraded
    status: "True"

Successful debugging

Debugging image mode doesn't have to be a black box operation. When you understand the three distinct stages (MachineOSConfig validation, MachineOSBuild execution, and image deployment to nodes) of the process, failures can be systematically narrowed down to identify where they occur and what the root cause is. The key is knowing where to look:

openshift-machine-config-operator pod logs for MOSC issues
Build job pod logs for image build failures
machine-config-daemon pod logs for node-level problems

Image mode failures usually happen during the build stage, often caused by pull secret authentication issues, Containerfile errors, or registry permission problems. The debugging techniques in this guide empower you to perform effective troubleshooting and, ultimately, successful deployment of customized node images.

Debugging image mode with Red Hat OpenShift 4.20: A practical guide

Tips for troubleshooting common image mode scenarios in OpenShift 4.20

Understanding the image mode process

Stage 1: MachineOSConfig creation

Stage 2: MachineOSBuild (MOSB) creation and image build process

When the MachineOSBuild resource is not created

When the MachineOSBuild was created, but failed

When the job is not created

Debugging a failed job

Auxiliary resources

Stage 3: Image applied to nodes

Success

Error

Successful debugging

Architect an open blueprint for cloud-native AI agents

Computer use: How AI agents can automate almost anything

PyTorch distributed is changing and TorchComms is why

What 429 chaos experiments taught us about Kubernetes operator resilience

Red Hat Dependency Analytics works with your private Trusted Profile Analyzer instance!

The Grumpy Developer's Guide to OpenShift

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links