Operators in Kubernetes often allow application developers to configure low-level aspects of their operands and secondary resources. Typically, such settings are made available on the custom resource and reconciled into the operand.
An example of this is the Grafana
custom resource of the Grafana Operator. It exposes many configuration options that are reflected in the Grafana configuration file, but also allows you to configure properties of the Kubernetes resources in your Grafana installation. For example, you can add additional ports to your service, mount secrets into the Grafana pod, or expose additional environment variables.
These fields in the custom resource are reconciled and applied to the respective Kubernetes resource. This article describes some problems with this approach, describes an alternative approach that is currently under development, and weighs the strengths and weaknesses of the two approaches.
The problem with exposing hand-picked configuration options
The issue with hand-picking properties of Kubernetes resources, exposing them on a custom resource, and then reconciling them is that it's hard to foresee what a user might want to modify. For the Grafana Operator, we often get requests to make additional fields of underlying resources configurable through the custom resource.
Additionally, Kubernetes resources change over time. Granted, that happens rather slowly, but it does happen—it happened when Ingress was promoted to v1, for instance.
Another issue with this approach is ease of use. Someone who is already familiar with configuring, say, a route, now needs to learn how to do that through your custom resource. And often only a subset of the available options is exposed.
A better approach
How could the definition of custom resources be improved for both application developers and the people who release resources such as Grafana? In the upcoming version of the Grafana Operator, we have started exposing raw Kubernetes resources in the custom resource. To configure a deployment, for instance, you will have access to a Deployment
object complete with the official spec
and metadata
. The same goes for all other resources that are managed by the Operator, the ServiceAccount
, the Route
or Ingress
, and the Service
.
You don't have to learn how to configure a resource through the Grafana Operator. Instead, you can focus on the Kubernetes resources you want to configure and do the configuration in the usual manner.
Difficulties
Sounds easy? Not quite. There are a few obstacles to overcome.
- Some Kubernetes resource definitions, such as
Deployment
, are huge and will bloat your custom resource definition (CRD). - Our new way of exposing resources is not suitable for partial specification, which is what application developers usually want most.
- We don't yet have a merge strategy to combine Operator defaults and custom overrides.
We can tackle those issues. The following subsections cover each issue.
CRD bloat
To address CRD bloat, we're going to strip the descriptions. When using kubebuilder to generate the CRDs from the code, we can pass the following parameter, which cuts down the size of our CRDs by two thirds:
crd:maxDescLen=0
Partial specification
Let me specify what problem we're trying to solve here. Resources such as Deployment
come with optional and mandatory fields. That is also true for the deployment spec
in our CRD. As soon as an application developer adds a non-empty spec
to a Deployment
, they are required to spec
out all the mandatory fields as well. This is not ideal. You might be interested in just overriding the replicas, without providing a full pod template.
The solution to this problem is not as simple as adding another parameter. We came across the solution while looking at Banzaicloud's operator-tools project. The idea is to provide your own definition of the spec
that has all the same fields but no mandatory fields.
For example, the original deployment spec
defines the pod template like this:
type DeploymentSpec struct {
...
Template v1.PodTemplateSpec `json:"template"`
...
}
In our own definition, we define v1.PodTemplateSpec
as a pointer and add the omitempty
tag to prevent the serializer from adding an empty key:
type CustomDeploymentSpec struct {
...
Template *v1.PodTemplateSpec `json:"template,omitempty"`
...
}
We also define our own Deployment
type:
type CustomDeployment struct {
ObjectMeta ObjectMeta `json:"metadata,omitempty"`
Spec CustomDeploymentSpec `json:"spec,omitempty"`
}
This gives us a resource with the same structure as a Deployment
, but all the top-level fields are optional.
Merge strategy
All that's left to do now is create a merge strategy to merge the overridden, custom deployment with the existing one. Our policy prefers to keep the existing fields in the original resource unless they are the defaults, and we ignore empty fields in the overridden resource.
Thankfully, Kubernetes's own apimachinery
library contains everything we need in its strategicpatch
package. apimachinery
deals with schemas and conversion. strategicpatch
compares two JSON representations of objects and produces a patch that can be applied to the resource definition.
Again, the operator-tools package contains an implementation of a merge function using strategicpatch
, and we use a slightly modified version of it in the Grafana Operator.
Merging a custom resource field
Let's see this in action. The default Grafana deployment created by the Operator looks like this:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
name: grafana-a-deployment
namespace: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana-a
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
...
We want to override the strategy to Recreate
, so we provide the following deployment in the Grafana custom resource spec
:
spec:
deployment:
spec:
strategy:
type: Recreate
We don't need to provide any of the fields that are mandatory for deployments, only the field we are interested in overriding: spec.strategy.type
. This produces an updated deployment with a strategy set to Recreate
.
Evaluating the strategy of exposing resources
Exposing raw Kubernetes resources is a powerful way for users to configure operands. Existing knowledge and documentation can be applied. The Operator provides everything that can be configured out of the box.
But disadvantages remain:
- The size of the CRD still increases considerably. If you need to configure a large number of resources, this process might not be a good choice.
- The process comes with a risk of misconfiguration. Application developers can override settings that the Operator or an operand depends on.
But we believe that, overall, the advantages of such a flexible configuration system usually outweigh the disadvantages.
Last updated: February 5, 2024