OpenShift Tuning Operator

I recently assisted a client to deploy Elastic Cloud on Kubernetes (ECK) on Red Hat OpenShift 4.x. They had run into an issue where Elasticsearch would throw an error similar to:

Max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]

According to the official documentation, Elasticsearch uses a mmapfs directory by default to store its indices. The default operating system limits on mmap counts are likely to be too low, which may result in out of memory exceptions. Usually, administrators would just increase the limits by running:

sysctl -w vm.max_map_count=262144

However, OpenShift uses Red Hat CoreOS for its worker nodes and, because it is an automatically updating, minimal operating system for running containerized workloads, you shouldn't manually log on to worker nodes and make changes. This approach is unscalable and results in a worker node becoming tainted. Instead, OpenShift provides an elegant and scalable method to achieve the same via its Node Tuning Operator.

The default tuned configuration contains a profile for Elasticsearch. The tuned operator on a given node looks for a pod running on the same node with the label set (match). If found, it applies the sysctl command (data).

You can view the default configuration by logging into your OpenShift cluster and running:

bastion $ oc get Tuned/default -o yaml -n openshift-cluster-node-tuning-operator

kind: Tuned
  name: default
  namespace: openshift-cluster-node-tuning-operator

  - name: "openshift-node-es"
    data: |
      summary=Optimize systems running ES on OpenShift nodes


  - profile: "openshift-node-es"
    priority: 20
    - label: ""
      type: "pod"


The trick is to ensure that the Elasticsearch operator tags its pods with the label: Below is an example of how to achieve this.

kind: Elasticsearch
  name: elasticsearch-tst
  version: "7.2.0"
  setVmMaxMapCount: false
  - config:
      node.master: true true
    nodeCount: 1

The tuned operator will read the pod label and add the vm.max_map_count=262144 to the node running the pod. This is useful because pods can be terminated and scheduled on different nodes across the cluster. No more manually worrying about the sysctl configuration of the nodes running a particular workload.

Thanks to James Ryles for helping solve this problem.

Let me know if you run into any issues.

Last updated: October 31, 2023