Featured image: 5 step API management.

In the enterprise Kubernetes world, many developers use Kubernetes as a platform for designing, building, and deploying applications. The platform needs to be managed and provide a range of features, integrations, and checks and balances for developers. The role of the platform engineer is important in that regard. Policies are at the heart of what platform engineers must define and enforce in their platform. Two technologies that aim to make defining and enforcing policies easier are Gatekeeper and Gateway API.

In this article, we'll talk about how these two policy-focused technologies can work together to solve platform engineer and application developer concerns, with practical examples.

What are Gatekeeper policies?

Gatekeeper is a policy controller for Kubernetes. Its main focus is to allow users to enforce specific information or formatting in policies for security or legal requirements or enforce governance said users might require. For example, a policy could prevent users from using a volume type of hostPath in their Pod spec. The policy logic is implemented as admission webhooks and auditing, which can be defined using the Rego policy language in resources like Constraints and ConstraintTemplates.

What are Gateway API policies?

Gateway API is a Kubernetes project focused on Layer 4 (Transport layer) and Layer 7 (Application Layer) routing in Kubernetes. There is a concept of policies and policy attachment in the Gateway API project. A Gateway API policy allows you to add new behavior to existing objects, without the need to change the spec of those objects. For example, a TLSConnectionPolicy resource could target a Service. Doing so would affect that Service by applying some TLS configuration to it. This can be seen as an alternative to adding annotations on the Service, to get the same desired behavior. The feature is abstracted out to a separate resource (or MetaResource, as it is known in the Gateway API project).

In short, Gateway API policies affect the behavior of something that's configured via Kubernetes resources (typically Gateway API resources like Gateways and HTTPRoutes).

How do these policies work together?

As both policy types serve different purposes, they can be used completely independently to solve problems. However, when used together they can solve more complex problems that overlap both areas. To show this, a couple of examples will be presented.

Example 1: Ensuring automatic configuration of TLS certificates for all listeners in a Gateway

There is a Gateway API policy from Kuadrant, called a TLSPolicy. This policy can be attached to a Gateway. When attached to a Gateway, the TLSPolicy controller will configure all HTTPS listeners in the Gateway with a certificate. It does this by leveraging cert-manager to get a signed certificate and place it in a Secret in the correct location. The TLSPolicy spec references a certificate issuer configuration in cert-manager to use when getting the certificate.

In this example, let's assume the organization where this Gateway is deployed requires a TLSPolicy for every Gateway so that certificates are managed automatically for developers. One solution is to create a TLSPolicy resource for each Gateway. However, what if the certificate configuration to use is not known ahead of time? The important thing is that the Gateway must have a TLSPolicy targeting it. The details of that TLSPolicy are not important in this context.

To ensure this requirement is met, a Gatekeeper policy can be used. A TLSPolicy targets a Gateway by specifying the Gateway name and namespace in spec.targetRef, like this:

apiVersion: kuadrant.io/v1alpha1
kind: TLSPolicy
metadata:
  name: prod-web
  namespace: multi-cluster-gateways
spec:
  targetRef:
    name: prod-web
    group: gateway.networking.k8s.io
    kind: Gateway

So, in order to check that a Gateway has a TLSPolicy targeting it, a reverse lookup is required when validating each Gateway. To do this, a Gatekeeper ConstraintTemplate can be used:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: requirepolicytargetinggateway
spec:
  crd:
    spec:
      names:
        kind: RequirePolicyTargetingGateway
      validation:
        openAPIV3Schema:
          type: object
          properties:
            groupVersion:
              description: The groupVersion of a Policy to check the targetRef in e.g. kuadrant.io/v1alpha1
              type: string
            kind:
              description: The kind of a Policy to check the targetRef in. e.g. TLSPolicy
              type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequirepolicytargetinggateway

        violation[{"msg": msg}] {
                policies := [o | o = data.inventory.namespace[_][input.parameters.groupVersion][input.parameters.kind][_]]
            msg := check_policies(policies)
        }

        check_policies(policies) = msg {
                gateway_name := input.review.object.metadata.name
            gateway_namespace := input.review.object.metadata.namespace

            targetting_policies := [o | o = policies[_]; o.spec.targetRef.name == gateway_name]
            count(targetting_policies) == 0
            msg := sprintf("No %v targeting Gateway %v/%v", [input.parameters.kind, gateway_namespace, gateway_name])
        }

This main logic of the template is in targets[].rego. The violation block gathers all policies of the specified kind (e.g., TLSPolicy) and groupVersion, and runs them through check_policies. This block checks the Gateway under review and iterates through all policies to check if at least one policy targets it. If no policies were found to be targeting the gateway, a warning violation message is output.

The TLSPolicy constraint for the above ConstraintTemplate looks like this:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RequirePolicyTargetingGateway
metadata:
  name: require-tlspolicy-targeting-gateway
spec:
  enforcementAction: warn
  match:
    kinds:
      - apiGroups: ["gateway.networking.k8s.io"]
        kinds: ["Gateway"]
  parameters:
    kind: TLSPolicy
    groupVersion: "kuadrant.io/v1alpha1"

There is one more piece of Gatekeeper configuration required. Gatekeeper has a data inventory or cache of resources it knows about in the Kubernetes cluster. This inventory won't have any resources in it unless Gatekeeper is configured to sync them. In this example, we don't need to sync Gateways, as those are given as context when being reviewed. However, the TLSPolicy resources do need to be synced. To do that, the following Config resource can be used.

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  sync:
    syncOnly:
      - group: "kuadrant.io"
        version: "v1alpha1"
        kind: "TLSPolicy"

This ensures all TLSPolicy resources are readily available in data.inventory when executing any rego.

With the above resources all configured, a violation will be included in the status of the Constraint whenever a Gateway doesn't have a TLSPolicy targeting it. The violation message will look like this:

No TLSPolicy targeting Gateway infra/prod-web

You can see all constraints and violations by issuing this command:

kubectl get constraints

Example 2: Restricting a rate limit configuration to an max upper limit

There is another Gateway API policy from Kuadrant, called a RateLimitPolicy. This policy can be attached to a Gateway or HTTPRoute to provide rate limiting of requests. The configuration is expressed in terms of a duration, a unit and the limit.

For example:

kind: RateLimitPolicy
apiVersion: kuadrant.io/v1beta2
metadata:
  name: petstore
spec:
  targetRef:
    kind: HTTPRoute
    name: petstore
  limits:
    getInventory:
      rates:
        - limit: 1000
          duration: 10
          unit: second

With this RateLimitPolicy, requests to any endpoint this policy is applied to will be limited to 1,000 requests every 10 seconds. If that limit is reached, a 429 http response is returned until the 10 seconds time slot is up.

In a scenario where the Gateway is managed by one person (platform engineer), and an application is managed by another person (developer), it's reasonable to have upper limits set for all traffic (say, 500 requests per minute across all applications), particularly for protecting an infrastructure load balancer. The platform engineer can convey this information in an automated way to the developer via a Gatekeeper policy so that expectations are set when the developer is creating a RateLimitPolicy for their application. It would be confusing for the developer if they were allowed to set a rate limit of 1,000 requests per minute, but they see limits being hit after 500 requests per minute.

Here is an example of a Gatekeeper ConstraintTemplate that validates the rate limit being set by the developer, and returns a useful validation message if they try to set a value above a max amount. It also restricts the unit types that can be used.

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: ratelimitpolicymaxlimit
spec:
  crd:
    spec:
      names:
        kind: RateLimitPolicyMaxLimit
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8sratelimitpolicymaxlimit

        violation[{"msg": msg}] {
              rlp_name := input.review.object.metadata.name
          rate := input.review.object.spec.limits[_].rates[_]
          not valid_duration_unit(sprintf("%v", [rate.duration]), rate.unit)

          valid_combos = concat(", ", input.parameters.rates)
          msg := sprintf("RateLimitPolicy '%v': Invalid combination of duration/unit '%v/%v'. Must be one of %v", [rlp_name, rate.duration, rate.unit, valid_combos])
        }

        violation[{"msg": msg}] {
              rlp_name := input.review.object.metadata.name
          rate := input.review.object.spec.limits[_].rates[_]
            some k
            rate_parts := split(input.parameters.rates[k], "/")
          max_limit := input.parameters.max_limits[k]
            
            rate.duration == to_number(rate_parts[0])
            rate.unit == rate_parts[1]
            rate.limit > max_limit

          msg := sprintf("RateLimitPolicy '%v': limit of '%v' at duration/unit '%v/%v' must be <= '%v'", [rlp_name, rate.limit, rate.duration, rate.unit, max_limit])
        }

        valid_duration_unit(duration, unit) {
              rate_parts := split(input.parameters.rates[_], "/")
          [duration, unit] = [rate_parts[0], rate_parts[1]]
        }

The first violation block checks that the duration and unit combination is one of the allowed combinations e.g. every 10 seconds, or every 1 minute. The second violation block checks that the limit is less than the max allowed for that duration and unit combination, e.g., less than 10 every 10 seconds, less than 30 every 1 minute.

The rego is split up like this so a more appropriate violation message can be given depending on the parameters being set. So it doesn't matter what the limit being set is if you use a duration and unit that's not allowed. If you are using a duration and unit that is allowed, then the limit matters. The actual rates and max limits to use are specified in the Constraint as parameters:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: RateLimitPolicyMaxLimit
metadata:
  name: rate-limit-policy-max-limit
spec:
  enforcementAction: deny
  match:
    kinds:
      - apiGroups: ["kuadrant.io"]
        kinds: ["RateLimitPolicy"]
  parameters:
    rates: ["10/second","60/second","300/second","1/minute","5/minute"]
    max_limits: [10,30,100,30,100]

When a RateLimitPolicy is being created or updated, a validation webhook will ensure the combination of duration and unit is in the rates list, and the limit is less than the corresponding value in the max_limits list. Let's look at a couple of examples using these parameters. With the following rate limit configuration:

      rates:
        - limit: 10
          duration: 30
          unit: second

The validation message will be:

RateLimitPolicy 'mypolicy': Invalid combination of duration/unit '30/second'. Must be one of "10/second","60/second","300/second","1/minute","5/minute"

And with limits set to the following:

      rates:
        - limit: 40
          duration: 1
          unit: minute

The validation message will be:

RateLimitPolicy 'mypolicy': limit of '40' at duration/unit '1/minute' must be <= '30'"

By using the Gatekeeper constraint, the platform engineer can convey additional information and practical restrictions on what the developer can do. However, they still have full flexibility with the RateLimitPolicy resource to subdivide limits as they see fit between their apps and endpoints.

This example scenario is somewhat simplified to show the integration point between Gatekeeper Constraints and Gateway API Policies. In a production environment there may be infrastructure rate limiting at play as well. The platform engineer may want to restrict the total of all limits defined to be less than some overall max value, or have some contention built in. In that scenario, the rego would need to sum the values across all RateLimitPolicies attached to the same Gateway and compare that to the allowed max value.

Summary and conclusion

Gatekeeper policies and Gateway API policies solve distinct problems, despite the term "policy" used in both. However, the two technologies can be used together to solve a more complex problem than either can solve on their own, and surface it to users in an intuitive way.

The first example focuses on auditing the current state of the system, allowing the engineer to add their own context around governance and security. That context is going to be different in each organization and needs that flexibility. Gatekeeper provides that flexibility, and works well with Gateway API resources.

The second example focuses on validation of resources beyond the validation that is bundled with a resource. This can highlight a potential new feature in a resource if it's something most users will need to do. If you own those resources, you can make the decision to add that extra feature or validation to the resource. In the case of Gateway API policies, there's an allowance for inherited policies with 'defaults and overrides' that could form part of the solution to our second example. However, when those options are not feasible, a Gatekeeper constraint can add that extra validation you need.

In the Kuadrant project, we built a proof of concept around API management features in which the above examples were included. You can see and try out that proof of concept by following the quick start guide.

Find out more

  • Check out the Gatekeeper docs for more info on Gatekeeper policies, constraints, and auditing,
  • The Gateway API site covers everything about the Gateway API project from sig-network.

The Kuadrant project has a number of Gateway API policies and tooling for auth, rate limiting, security and traffic management. Check out the Kuadrant community page if you’d like to get in touch with the Kuadrant team.

Last updated: April 1, 2024