Open Data Hub and Kubeflow installation customization

The main goal of Kubernetes is to reach the desired state: to deploy our pods, set up the network, and provide storage. This paradigm extends to Operators, which use custom resources to define the state. When the Operator picks up the custom resource, it will always try to get to the state defined by it. That means that if we modify a resource that is managed by the Operator, it will quickly replace it to match the desired state.

This can get confusing when we're trying to customize Open Data Hub (ODH) or Kubeflow installations, because the KFDef custom resource does not expose all of the potential configuration options.

In this guide, I'll walk through three options to modify the deployment: directly edit manifests in a fork, create repositories with overrides, and add overlays. I'll use an example of customizing the JupyterHub component via a provided ConfigMap.

How to customize the installation

You can customize the deployment in three different ways. All of them involve working with Git and the odh-manifests repository:

  • Approach 1: Fork odh-manifests and maintain changes there.
  • Approach 2: Create your own manifests repository to override the component resources.
  • Approach 3: Fork the odh-manifests repository and extend it with component overlays.

Each of the above will require you to have a Git repository with Kustomize manifests and Red Hat OpenShift resources. (You will reference these from the KFDef resource.) However, each approach requires a different level of effort to create and maintain.

I recommend the third approach for customizing installations. However, in this article, I will go through each option in detail.

Approach 1: Fork and modify the odh-manifests repository

To fork and modify the odh-manifests repository and maintain the changes there:

  1. Go to the odh-manifests repository and hit the fork button.
  2. Clone your fork and make changes to it.
  3. Push the changes to the forked repository, update the KFDef resource to point to the fork, and deploy.

It is quite simple to start with this approach. However, it might be hard to maintain if you need to sync back with the original odh-manifests repository. Making any change could result in a conflict on the Git level. That would require you to manually intervene during rebases. This is true even for simple things like adding an environment variable or changing its value.

On the other hand, it is easy to see the changes that the upstream repository is making, because you have the Git history and tracking.

Approach 2: Customize by overrides

To customize by overrides, start from an empty repository, and then:

  1. Create the directory structure that matches the one in the odh-manifests repository.
  2. Add the files you would like to customize.
  3. Add your repository to the KFDef resource as an additional repository reference. Then duplicate the component you are customizing. It is important to change the repository reference (repoRef) to the repository with your overrides.

The Operator uses the repositories and components in KFDef in the order they are defined. This means your files replace the files coming from odh-manifests.

This approach is simple, and you do not have to worry about Git conflicts. On the other hand, it is harder to keep track of your changes in the upstream odh-manifests repository. For example, an Operator could rename a file you are overriding in odh-manifests. Then you'd end up in an inconsistent state where you do not override the resource but only add it to the set of deployed manifests. It is harder to spot such changes in this approach because you do not have the Git history connecting the two repositories.

For example, let's customize the JupyterHub component. To do this, we'll use the odh-manifests-overrides example repository, which provides a customization to a ConfigMap deployed as a part of JupyterHub component:

  1. Deploy the Open Data Hub Operator and upload the KFDef custom resource to the cluster (see Quick Installation guide):
$ oc apply -f https://raw.githubusercontent.com/vpavlin/odh-manifests-overrides/master/kfdef/kfctl_openshift_custom.yaml
  1. Wait for the Open Data Hub instance to start.
  2. Run the following command to obtain the customized ConfigMap:
$ oc describe cm jupyterhub-cfg

The content should match the ConfigMap available in the odh-manifests-overrides repository.

Approach 3: Customize with overlays

This last approach is a combination of the previous approaches:

  1. Fork the repository, but don't touch any manifests already existing in the repository.
  2. Add new overlays.
  3. Create new resources or remove and modify existing ones.

It's more complicated to work with overlays than to directly edit the manifests. However, this process is more convenient and consistent than creating the overrides. In the long term, the overlays approach provides more flexibility. With it, you can also create various overlays for different use cases or environments in the same code base.

This approach prevents the conflicts on rebases because it does not require you to change any existing manifests. The overlays should keep working across rebases and ODH versions. There are a few exceptions. One is if you're doing something complicated like renaming existing resources via patches. Another exception is if there are significant changes to the component upstream, like a major component version change that requires a complete change of resource type.

To customize the JupyterHub component, let's look at another example. This time we will use the odh-manifests-overlays repository, which provides a sample of the overlays approach. You can immediately see that this is a copy of the odh-manifests repository. It also shares its Git history.

This version has one commit in addition, which adds an overlay to the JupyterHub component. This overlay modifies a JupyterHub ConfigMap. It also adds an additional JupyterHub singleuser profiles ConfigMap:

Let's test this out:

  1. Deploy the Open Data Hub Operator (see Quick Installation guide).
  2. Upload the customized KFDef resource to the cluster:
$ oc apply -f https://raw.githubusercontent.com/vpavlin/odh-manifests-overlays/master/kfdef/kfctl_openshift_custom.yaml

The customized ConfigMap should appear in the cluster:

$ oc describe cm jupyterhub-cfg
  1. You can also view the added ConfigMap by running the following command:
$ oc describe cm jupyterhub-additional-singleuser-profiles

Kubeflow

At the beginning of this article, I promised you’d learn how to customize the Kubeflow deployments as well. Yet, I haven't even mentioned Kubeflow. This is because you can apply anything regarding the Open Data Hub to Kubeflow manifests thanks to Open Data Hub adopting Kubeflow deployment tools. Both repositories follow the same structure and leverage the same patterns for component definition and customization. (See "Open Data Hub 0.6 brings component updates and Kubeflow architecture" for more details.)

Conclusion

In this article, I introduced three different ways to customize and maintain the changes to individual Open Data Hub deployments. These are to directly edit manifests in a fork, to create repositories with overrides, and to add overlays.

The Open Data Hub documentation only talks about one of those in detail: the overlays. We believe it is the most flexible, maintainable, and convenient of all three. This is also what we recommend for the internal deployment of the Open Data Hub at Red Hat. The Internal Data Hub team maintains a fork of the odh-manifests repository and uses overlays to customize the default configuration to fit the needs of this production deployment.

Last updated: August 1, 2023