At Red Hat our engineers leverage the capabilities of Konflux-CI to orchestrate and execute integration tests against their applications, services and products. An integral part of testing these offerings involves installing them on a Red Hat OpenShift cluster.
What is Konflux?
Konflux is a continuous integration/continuous delivery (CI/CD) service that simplifies the adoption of the processes, technologies, and expertise that Red Hat uses to build, test, and release production software. Delivering a secure software supply chain is the primary mission of Konflux. It provides default pipeline definitions and automated security checks to generate Supply chain Levels for Software Artifacts (SLSA) Level 3 build images from application code across a variety of programming languages. Build images are composed into Snapshots and passed to integration test pipelines on their journey towards being released.
Challenges
In regards to testing, many of Red Hat's development teams require administrative access to a Red Hat OpenShift cluster to verify their operator lifecycle manager (OLM) operators. Due to limited cluster access, engineers typically get stuck trying to onboard their tests. They also commonly lack accounts or permissions necessary to host the platform. Support tickets usually ensue and the onboarding experience turns into a multi-day ordeal. To resolve this common inefficiency, we started looking at ways to quickly provide OpenShift clusters to the CI workflows used by our engineers and their teams on demand. We wanted to provide this as a service to satisfy the majority of the testing requirements shared across teams. The necessary cloud or infra credentials would ideally be provided by administrators of Konflux but protected with guardrails.
What provides such a service? Enter the Cluster-as-a-Service Operator (CaaS).
CaaS Operator
The CaaS Operator allows us to:
- Grant our unprivileged users access to request instances of a cluster template using Kubernetes custom resources.
- Leverage Argo CD and Helm charts to manage and deploy the clusters.
- Configure quotas to restrict the number of active template instances globally and per namespace.
- Provide a self-service option for users to use our templates but bring their own credentials for provisioning cluster instances and infrastructure.
We began developing our cluster template Helm Charts with HyperShift in mind and deferred provisioning with Hive to our backlog. Our first chart, hypershift-aws-template
, uses hooks to trigger and wait for Jobs which either create or destroy the ephemeral cluster. We use the HyperShift command-line interface (CLI) since it’s capable of provisioning the Amazon Web Services (AWS) infrastructure, identity access management (IAM), and cluster resources all from a single binary. The chart is quite generic so feel free to try it out. Review the Git repository documentation for instructions about prerequisites and execution.
The chart is referenced from a ClusterTemplate
in our cluster-as-a-service component. Browse our repo for examples showing how Red Hat configures the CaaS operator as well as Konflux generally. Our manifests are built with Kustomize and deployments are managed with Argo CD as provided by the Red Hat OpenShift GitOps operator.
Ephemeral clusters
In our infrastructure, we operate our HyperShift management clusters with a publicly routable API server (to satisfy the requirements for communication between nodes and the control plane). Auto scaling is enabled on the HyperShift management cluster so compute resources are dynamically added as more control planes for the ephemeral cluster are provisioned. ROSA with hosted control planes (HCP) makes for a great (development or production) option for this cluster since our template already requires access to an AWS account and it’s recommended to use control planes and worker nodes within the same cloud provider.
Konflux uses Tekton Pipelines for CI workloads. We developed a series of convenient StepActions our users can mix and match in their Pipelines/PipelineRuns as needed. They can be used to (among other things):
- Pick a version of OpenShift to provision.
- Provision an ephemeral HyperShift cluster.
- Retrieve the credentials for an ephemeral cluster.
Take a look at a sample cluster provisioning pipeline or a simple Operator deployment test to see how it all comes together for the user.
One of these StepActions
creates a ClusterTemplateInstance
on the management cluster which kick starts the provisioning process. From there the step waits for the cluster to become available. The entire process typically takes 10-20 minutes. Out-of-band deletion of the ClusterTemplateInstance
triggers deletion of the cluster but we’ve also configured automatic deletion of the ClusterTemplateInstance
via the ClusterTemplate
spec. This provides additional assurances that there shouldn’t be orphaned cluster resources uncontrollably increasing our cloud expenses.
Future plans
We’re just getting started with providing this service on top of Konflux. Our users are already appreciating its low barrier for entry and multiarch (amd64, arm64) support from HyperShift out of the gate. Additional enhancements to our cluster templates are already being planned, so follow our activity on GitHub for the latest updates.