In Red Hat OpenShift 4.20, we've introduced oc adm upgrade recommend, a new command designed to streamline your OpenShift updates. With this command (now generally available), you can plan and assess update risks with greater confidence, ensuring a smoother, more accessible update experience.
The update planner
Released as a technology preview in OpenShift 4.18 and now generally available in OpenShift 4.20, this command is your go-to starting point for any update. It’s a read-only command that provides you with a clear, focused recommendation for the best version for the update. You no longer have to guess or sort through a long list of possible options, simply run the command to get a tailored suggestion specific to your cluster.
The precheck feature
The command includes the precheck feature, which actively checks for important alerts that can get in the way of a smooth update. The command gives you a quick summary of what it checked, then provides details about any concerns. It reports issues like ClusterOperatorDown alert, which is a critical sign, or a PodDisruptionBudgetAtLimit warning, which could slow things down. By presenting you these details upfront, you have all the information you need to address any issues found prior to the update.
Example output
The following is a sample oc adm upgrade recommend output. You can see how it clearly flags potential problems and gives you a recommended path.
The following conditions found no cause for concern in updating this cluster to later releases: recommended/NodeAlerts (AsExpected), recommended/PodImagePullAlerts (AsExpected)
The following conditions found cause for concern in updating this cluster to later releases: recommended/CriticalAlerts/ClusterOperatorDown/0, recommended/PodDisruptionBudgetAlerts/PodDisruptionBudgetAtLimit/1
recommended/CriticalAlerts/ClusterOperatorDown/0=False:
Reason: Alert:firing
Message: critical alert ClusterOperatorDown firing, suggesting significant cluster issues worth investigating. Cluster operator has not been available for 10 minutes. The alert description is: The monitoring operator may be down or disabled because UpdatingPrometheusFailed, and the components it manages may be unavailable or degraded. Cluster upgrades may not complete. For more information refer to 'oc get -o yaml clusteroperator monitoring' or https://console-openshift-console.apps.ci-ln-4xlhr32-72292.origin-ci-int-gce.dev.rhcloud.com/settings/cluster/. https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDown.md
recommended/PodDisruptionBudgetAlerts/PodDisruptionBudgetAtLimit/1=False:
Reason: Alert:firing
Message: warning alert PodDisruptionBudgetAtLimit firing, which might slow node drains. Namespace=openshift-monitoring, PodDisruptionBudget=prometheus-k8s. The pod disruption budget is preventing further disruption to pods. The alert description is: The pod disruption budget is at the minimum disruptions allowed level. The number of current healthy pods is equal to the desired healthy pods. https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-controller-manager-operator/PodDisruptionBudgetAtLimit.md
Upstream update service: https://api.integration.openshift.com/api/upgrades_info/graph
Channel: candidate-4.18 (available channels: candidate-4.18, candidate-4.19, candidate-4.18, eus-4.18, fast-4.18, fast-4.19, stable-4.18, stable-4.19)
Updates to 4.18:
VERSION ISSUES
4.18.32 no known issues relevant to this cluster
4.18.30 no known issues relevant to this clusterIncorporate custom alerts
You have the option to add specific alerts that are not critical or not detected by the precheck feature in the recommend command by setting up a custom platform-monitoring critical-alert. For example, you can search for firing non-critical alerts by name, and trigger a custom critical alert to highlight the lower-severity underlying alerts.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: critical-alert-escalation-or-custom-rule
namespace: openshift-monitoring
spec:
groups:
- name: critical-alert-escalation-or-custom-rule
rules:
- alert: CriticalAlertEscalationOrCustomRule
expr: ALERTS{alername=~"AlertA|AlertB|...", alertstate="firing"}
labels:
severity: critical
annotations:
summary: Alert {{ $labels.alertname }} escalated to 'critical'
description: Upgrade related issue to highlight to sysadmins.
runbook_url: <https://example.com/fixme/docs/for/your/responding/sysadmins>You can also declare completely new critical alerts, based on any Prometheus metrics in the platform monitoring store. The alert will appear (if triggered) when you run the precheck command. For example, we’ve added the following Red Hat OpenShift Data Foundation alert:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-new-odf-alert
namespace: openshift-monitoring
spec:
groups:
- name: ceph.rules
rules:
- alert: CephTooFewPGs
# >0 for at least 5 minutes
expr: max by (cluster) (ceph_health_detail{name="TOO_FEW_PGS"}) > 0
for: 5m
labels:
severity: critical
# every label set here is forwarded to Alertmanager
annotations:
summary: Ceph cluster {{ $labels.cluster }} has too few PGs
description: |
Ceph is reporting TOO_FEW_PGS health warning for more than 5 min.
runbook_url: <https://example.com/fixme/docs/for/sysadmins>The --accept flag policy
You can use the --accept flag when you pre-approve certain risks for the cluster update.
For example, you might review a known alert and decide it's okay for all clusters to proceed with it. As the cluster administrator, you can use the --accept flag to confirm moving forward while explicitly acknowledging these pre-approved risks:
$ oc adm upgrade recommend --version 4.y.z --accept ClusterOperatorDown,PodDisruptionBudgetAtLimit If the command finds any unapproved issues, it won’t let you proceed, ensuring that your update aligns with your policy.
Summary of workflow components:
- recommend
- Purpose: discovery and guidance
- Command:
oc adm upgrade recommend - Impact: Provides a safe, non-destructive way to plan updates.
- precheck
- Purpose: validation and risk assessment
- Command: Part of the
recommendcommand. - Impact: Identifies potential alerts without altering the system. Checks for custom critical alerts.
- accept
- Purpose: explicit authorization
- Command:
--acceptflag on recommend command - Impact: Acts as a gatekeeper, requiring a successful
precheckbefore the update can proceed.
Don’t forget to run the oc adm upgrade --to <recommended-version> prior to updating your cluster.
We'd love to hear from you
We're really excited about the oc adm upgrade recommend command. With its integrated precheck and --accept features, it makes updates more predictable. By separating the planning, validation, and authorization stages, this command gives you more control and confidence at every step.
When you're planning your next cluster update, try the oc adm upgrade recommend command, and let us know what you think.