Kubernetes operator

The open source Operator Framework is a toolkit to manage Kubernetes-native applications. The framework and its features provide the ability to develop solutions to simplify some complexities, such as the process to install, configure, manage and package applications on Kubernetes and Red Hat OpenShift. It provides the ability to use a client to perform CRUD actions, that is, operations to create, read, update, and delete data on these platforms.

By using operators, it's possible not only to provide all expected resources but also to manage them dynamically, programmatically, and at execution time. To illustrate this idea, imagine if someone accidentally changed a configuration or removed a resource by mistake; in this case, the operator could fix it without any human intervention. We'll take a look at Operators and the Operator SDK in this article.

Note: As a prerequisite for this content, it's essential to follow the steps outlined in the Getting Started guide.

APIs

When following the Getting Started, one of the first steps is to run the command operator-sdk add api --api-version=cache.example.com/v1alpha1 --kind=Memcached. The purpose of this command is to generate Custom Resource (CR) and Custom Resource Definition (CRD) resources for the Memcached Kind. This command is creating the API with the group cache.example.com, and version v1alpha1  which uniquely identifies the new CRD of the Memcached Kind.

Consequently, by using the Operator SDK tool, we can create our APIs and objects that will represent our solutions on these platforms. The Getting Started tutorial adds only a single kind of resource; however, it could have as many Kinds as needed (1…N). Basically, the CRDs are a definition of our customized Objects, and the CRs are an instance of it.

Project

The Manager is responsible for managing Controllers and, then by the controllers, we can do operations on the cluster side. For a better understanding of how it works, see that in the example, one of the steps was to create a Docker image with the command $operator-sdk build user/image:tag and then replace the value REPLACE_IMAGE in the file operator.yaml file.  This file describes the project instance built by it. Note that, by running the command kubectl create -f deploy/operator.yaml we are creating a pod with this image.

Demonstrating the idea

Let's think about the classic scenario where the goal is to have an application and its database running on the platform with Kubernetes. Then, one object could represent the App, and another one could represent the DB. By having one CRD to describe the App and another one for the DB, we will not be hurting concepts such as encapsulation, the single responsibility principle, and cohesion. Damaging these concepts could cause unexpected side effects, such as difficulty in extending, reuse, or maintenance, just to mention a few.

In conclusion, the App CRD will have as its controller the DB CRD. Imagine, that a Deployment and Service are required for the application run so that the App's Controller will provide these resources in this example. Similarly, the DB's controller will have the business logic implementation of its objects.

In this way, for each CRD, one controller should be produced according to the design set by the controller-runtime.

Controller main functions

Reconcile()

The reconcile function is responsible for synchronizing the resources and their specifications according to the business logic implemented on them. In this way, it works like a loop, and it does not stop until all conditionals match its implementation. The following is pseudo-code with an example that clarifies it.

reconcile App {

   // Check if a Deployment for the app exists, if not create one
   // If has an error, then go to the beginning of the reconcile
   if err != nil {
       return reconcile.Result{}, err 
   } 
   
   // Check if a Service for the app exists, if not create one 
   // If has an error, then go to the beginning of the reconcile
   if err != nil {
       return reconcile.Result{}, err 
   }  

   // Looking for Database CR/CRD 
   // Check the Database Deployments Replicas size
   // If deployment.replicas size != cr.size, then update it
   // Then, go to the beginning of the reconcile
   if err != nil {
       return reconcile.Result{Requeue: true}, nil
   }  
   ...
   
   // If it is at the end of the loop, then:
   // All was done successfully and the reconcile can stop  
   return reconcile.Result{}, nil

}

The following are possible return options to restart the Reconcile:

  • With the error:

return reconcile.Result{}, err

  • Without an error:

return reconcile.Result{Requeue: true}, nil

  • Therefore, to stop the Reconcile, use:

return reconcile.Result{}, nil

Note: For more details, check the Reconcile and its Result implementations.

Watch()

The watches are responsible for ''watching" the objects and triggering the Reconcile. Also, the Operator SDK tool will generate a Watch function for each primary resource (CRD). Here is an example:

// Watch for changes to primary resource Memcached
err = c.Watch(&source.Kind{Type: &cachev1alpha1.Memcached{}}, &handler.EnqueueRequestForObject{})
if err != nil {
    return err
}

By following the Getting Started, a watch function for each secondary object managed by it will also be implemented, such as below.

// Watch for changes to secondary resource Pods and requeue the owner Memcached

err = c.Watch(&source.Kind{Type: &appsv1.Deployment{}}, &handler.EnqueueRequestForOwner{
    IsController: true,
    OwnerType:    &cachev1alpha1.Memcached{},
})
if err != nil {
    return err
}

err = c.Watch(&source.Kind{Type: &corev1.Service{}}, &handler.EnqueueRequestForOwner{
    IsController: true,
    OwnerType:    &cachev1alpha1.Memcached{},
})
if err != nil {
    return err
}

Also, the following code ensures the quantity of Memcached replicas running on the cluster.

// Ensure the deployment size is the same as the spec
size := memcached.Spec.Size
if *deployment.Spec.Replicas != size {
    deployment.Spec>.Replicas = &size
    err = r.client.Update(context.TODO(), deployment)
    if err != nil {
        reqLogger.Error(err, "Failed to update Deployment.", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
         return reconcile.Result{}, err
    }
}

After that, you can check that the above code worked by doing the following steps.

  1. Scale the Memcached pod up or down.
  2. Check that the replicas will come back for the original size because of the above code.

Note: The above steps will only work if you were able to follow the guide and all finished successfully.

Last updated: March 29, 2023