Red Hat OpenShift Service on AWS with hosted control planes enables configuration of cluster monitoring operator for additional observability

Red Hat OpenShift Service on AWS with hosted control planes is a fully managed application platform for building, deploying, and scaling applications. OpenShift Service on AWS now enables administrators of clusters with hosted control planes to configure the platform monitoring stack with the cluster monitoring operator so you can take full advantage of the monitoring and alerting capabilities provided by Red Hat OpenShift.

OpenShift includes a platform monitoring stack to monitor cluster components such as nodes, control plane, and operators, and an optional user workload monitoring stack to monitor user-defined projects. While OpenShift Service on AWS with hosted control planes notifies customers about cluster health issues, previous limitations on configuring the platform monitoring stack created several challenges for administrators. Starting today, OpenShift Service on AWS with hosted control planes introduces the following:

On new clusters: New clusters are created with monitoring defaults according to the OpenShift version.
On existing clusters: Existing clusters retain the existing configuration of the cluster monitoring stack.
On both new and existing clusters: You can use OpenShift monitoring APIs to configure the cluster monitoring stack.

This unlocks a host of improvements and capabilities, including:

Alert routing: You can now configure receivers and get alerts based on platform metrics directly on their desired destinations.
Operator observability: You can now monitor optional OpenShift Lifecycle Manager operators installed from the Operator Hub into openshift-* namespaces.
Metrics retention: You can retain platform metrics either in-cluster, or write metrics to a remote location.
Ease of compute configuration: You can now explicitly place the monitoring stack components on worker nodes of your choice to either optimize performance or isolate monitoring components from the rest of the workloads.
Cost savings: You can entirely remove persistence for platform metrics to save costs on dev and test clusters.

Changes to configuration

New clusters use OpenShift's default monitoring settings, which significantly changes the cluster's monitoring stack configuration.

Cluster monitoring in the openshift-monitoring namespace

Previously, the cluster monitoring configuration in the openshift-monitoring namespace used a configmap named cluster-monitoring-config, which was created and configured by default. Now, on a new cluster, the configmap is not created by default, and instead falls back to the presets of the cluster monitoring operator.

User workload monitoring

Previously, user workload monitoring was enabled by default, and you could disable it during or after cluster installation. Now, on a new cluster, user workload monitoring is disabled by default and cannot be enabled during cluster installation. Instead, enable or disable it after cluster installation using the cluster-monitoring-config configmap in the openshift-monitoring namespace.

Persistence of metrics

For clusters created previous to this change, platform monitoring metrics persist by default using AWS EBS volumes of 100 Gi. Now, on a new cluster, platform monitoring metrics aren't persisted by default. You can configure persistence after cluster installation.

Retention of metrics

For clusters created previous to this change, platform monitoring metrics are retained for 11 days by default. The default retention storage limit is 90 GiB. Now, on a new cluster, platform monitoring metrics are retained for 15 days with no retention limit by default.

Topology spread

Clusters created previous to this change were configured to spread platform monitoring pods across availability zones. Now, on a new cluster, this is not configured by default.

Tolerations

Clusters created previous to this change were configured by default to tolerate the taint node-role.kuberenetes.io/infra. Now, on a new cluster, this is not configured by default.

Seven best practices

Here are some best practices to configure cluster monitoring stack:

If you are deleting and recreating your existing cluster, then your new cluster will have different defaults. If you need to configure your new cluster with monitoring that matches your existing cluster, visit the hypershift-dataplane-metrics-forwarder Git repository to find the configuration.
Ensure that your worker nodes are configured such that the cluster monitoring stack deployed on worker nodes can be available and healthy. If the monitoring stack is either unavailable or degraded, you're notified by OpenShift Service on AWS with hosted control planes.
Configure external watchdog-based alerting for the monitoring stack. OpenShift monitoring already includes a watchdog alert for heartbeat monitoring. Alertmanager sends notifications to configured providers, enabling administrators to be alerted when the watchdog falls silent.
Ensure that the cluster monitoring stack is available and not degraded before you upgrade your cluster. Your cluster upgrades may be cancelled if the monitoring stack is unhealthy.
If you configure the cluster worker nodes on multiple availability zones, use topology spread to distribute the cluster monitoring stack components across zones to be able to tolerate failure of single availability zone.
If you need to isolate the monitoring stack components from the rest of the application workloads on the cluster, use pod placement techniques as required.
Adjust metric retention and persistence volume sizes according to your needs for retaining the metrics.

To learn more about monitoring in OpenShift, visit OpenShift documentation. To get started, visit OpenShift Service on AWS with hosted control planes product page.

Red Hat OpenShift Service on AWS with hosted control planes enables configuration of cluster monitoring operator for additional observability

Changes to configuration

Cluster monitoring in the openshift-monitoring namespace

User workload monitoring

Persistence of metrics

Retention of metrics

Topology spread

Tolerations

Seven best practices

Red Hat Enterprise Linux 10.2 and 9.8: Top features for developers

What GPU kernels mean for your distributed inference

Debugging image mode with Red Hat OpenShift 4.20: A practical guide

EvalHub: Because "looks good to me" isn't a benchmark

SQL Server HA on RHEL: Meet Pacemaker HA Agent v2 (tech preview)

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links