Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Secure your Kubernetes deployments with eBPF

December 16, 2021
Sascha Grunert
Related topics:
ContainersLinuxKubernetesSecurity
Related products:
Red Hat OpenShift

Share:

    Numerous adaptations of the Linux kernel—notably seccomp, SELinux, and AppArmor—bolster its security through runtime checks on sensitive activities such as file access and system calls (syscalls). In particular, seccomp denies access to system calls that don't match rebuild profiles of allowed calls. But the creation of seccomp profiles for Kubernetes workloads can be a major obstacle to deploying containerized applications. Those profiles have to be maintained over the complete life cycle of the application because changing the code might require changes to the seccomp rules as well.

    To overcome this burden, it would be absolutely stunning if developers could record seccomp profiles by running a test suite against the application and automatically deploy the results together with the application manifest. But how to record seccomp profiles? Well, the Security Profiles Operator in Kubernetes offers several ways to record activity. This article shows how to use the Operator to secure your applications and how the recorder that uses extended Berkeley Packet, eBPF (or just BPF) does the job.

    What is the Security Profiles Operator?

    The Security Profiles Operator is a project sponsored by the Node Special Interest Group, which aims to make security easier on Kubernetes. Right now, the Operator offers Custom Resource Definitions (CRDs) that support seccomp, SELinux, and AppArmor.

    The CRDs ship with many features, one of which records security profiles from running workloads. Several types of recorders are available by default:

    • The OCI hook, a syscall recorder for seccomp compatible with the Open Container Initiative specification.
    • The auditd log tracing recorder for SELinux and seccomp.
    • The new BPF-based recorder.

    This article focuses on the BPF recorder because it's one of the latest and most experimental additions to the Security Profiles Operator.

    Demo of the BPF recorder

    The following subsections show how easy it is to install the recorder, run sessions, and incorporate the results into a secure application. This example records system calls issued by the nginx web server.

    Install and configure the Operator

    First of all, we have to get the Operator up and running.

    cert-manager has to be installed before the Operator can run. I'm running my tests on Red Hat OpenShift 4.9, which does not ship cert-manager out of the box, but the installation is fairly straightforward:

    $ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.yaml
    

    When all the cert-manager pods are in a running state, deploy the Security Profiles Operator:

    $ kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/security-profiles-operator/master/deploy/operator.yaml
    

    Switch to the Operator's namespace to simplify further commands:

    $ kubectl config set-context --current --namespace=security-profiles-operator
    

    The BPF recording feature is disabled in the Operator configuration by default because the recorder runs with high privileges on the hostPID process. To enable the recorder, patch the configuration of the Operator daemon running on every node to set enableBpfRecorder to true:

    $ kubectl patch spod spod --type=merge -p '{"spec":{"enableBpfRecorder":true}}'
    securityprofilesoperatordaemon.security-profiles-operator.x-k8s.io/spod patched
    

    The Operator now rolls out new DaemonSet pods, which can take a bit of time depending on the cluster size. After the rollout finishes, every pod should be running the BPF recorder in one of its containers, as you can tell by checking their logs:

    $ kubectl logs ds/spod -c bpf-recorder
    Found 6 pods, using pod/spod-h7dpm
    I1115 12:02:45.991786  110307 main.go:182]  "msg"="Set logging verbosity to 0"
    I1115 12:02:45.991901  110307 deleg.go:130] setup "msg"="starting component: bpf-recorder"  "buildDate"="1980-01-01T00:00:00Z" "compiler"="gc" "gitCommit"="unknown" "gitTreeState"="clean" "goVersion"="go1.16.9" "libseccomp"="2.5.1" "platform"="linux/amd64" "version"="0.4.0-dev"
    I1115 12:02:45.991955  110307 bpfrecorder.go:105] bpf-recorder "msg"="Setting up caches with expiry of 1h0m0s"
    I1115 12:02:45.991973  110307 bpfrecorder.go:121] bpf-recorder "msg"="Starting log-enricher on node: ip-10-0-228-234.us-east-2.compute.internal"
    I1115 12:02:45.994232  110307 bpfrecorder.go:152] bpf-recorder "msg"="Connecting to metrics server"
    I1115 12:02:48.373469  110307 bpfrecorder.go:168] bpf-recorder "msg"="Got system mount namespace: 4026531840"
    I1115 12:02:48.373518  110307 bpfrecorder.go:170] bpf-recorder "msg"="Doing BPF load/unload self-test"
    I1115 12:02:48.373529  110307 bpfrecorder.go:336] bpf-recorder "msg"="Loading bpf module"
    I1115 12:02:48.373570  110307 bpfrecorder.go:403] bpf-recorder "msg"="Using system btf file"
    I1115 12:02:48.373770  110307 bpfrecorder.go:356] bpf-recorder "msg"="Loading bpf object from module"
    I1115 12:02:48.403766  110307 bpfrecorder.go:362] bpf-recorder "msg"="Getting bpf program sys_enter"
    I1115 12:02:48.403792  110307 bpfrecorder.go:368] bpf-recorder "msg"="Attaching bpf tracepoint"
    I1115 12:02:48.406205  110307 bpfrecorder.go:373] bpf-recorder "msg"="Getting syscalls map"
    I1115 12:02:48.406287  110307 bpfrecorder.go:379] bpf-recorder "msg"="Getting comms map"
    I1115 12:02:48.406862  110307 bpfrecorder.go:396] bpf-recorder "msg"="Module successfully loaded, watching for events"
    I1115 12:02:48.406908  110307 bpfrecorder.go:677] bpf-recorder "msg"="Unloading bpf module"
    I1115 12:02:48.411636  110307 bpfrecorder.go:176] bpf-recorder "msg"="Starting GRPC API server"
    

    The recorder does a system sanity check on startup to ensure everything works as expected. In our case, everything went well and we're ready to record.

    Recording your first profile

    The Security Profiles Operator ships with custom resources for its recordings. This means that a recording is a dedicated object and refers to a label selector. This selector links the workload being recorded to the actual logic behind the scenes.

    As an example, define this recording:

    apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
    kind: ProfileRecording
    metadata:
      name: my-recording
    spec:
      kind: SeccompProfile
      recorder: bpf
      podSelector:
        matchLabels:
          app: nginx
    

    This example uses the name my-recording, which will appear in the resulting seccomp profile and can be used to identify the results. You also have to select a kind of SeccompProfile and a target recorder of bpf. The podSelector matches all workloads within the cluster containing the label app: nginx.

    By saving the recording in a file named recording.yml, you can finally create the resource:

    $ kubectl create -f recording.yml
    profilerecording.security-profiles-operator.x-k8s.io/my-recording created
    

    Now you can run our workload, the following Deployment of nginx:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-deployment
    spec:
      selector:
        matchLabels:
          app: nginx
      replicas: 1
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
            - name: nginx
              image: nginxinc/nginx-unprivileged:1.21.4
    

    Run the deployment in the usual manner, saving results to a file:

    $ kubectl create -f deployment.yml
    deployment.apps/my-deployment created
    

    If the pod is in a running state, the Operator daemon indicates in its logs that it started recording the workload:

    $ export NODE=$(kubectl get pod -l app=nginx -o jsonpath="{.items[0].spec.nodeName}")
    $ export POD=$(kubectl get pods -l name=spod --field-selector spec.nodeName="$NODE" --no-headers -o custom-columns=:metadata.name)
    $ kubectl logs $POD -c bpf-recorder
    …
    I1115 12:12:30.029216   66106 bpfrecorder.go:654] bpf-recorder "msg"="Found container ID in cluster"  "containerID"="c2e10af47011f6a61cd7e92073db2711796f174af35b34486967588ef7f95fbc" "containerName"="nginx"
    I1115 12:12:30.029264   66106 bpfrecorder.go:539] bpf-recorder "msg"="Saving PID for profile"  "mntns"=4026533352 "pid"=74384 "profile"="my-recording-nginx-0-1636978341"
    I1115 12:12:30.029428   66106 bpfrecorder.go:512] bpf-recorder "msg"="Using short path via tracked mount namespace"  "mntns"=4026533352 "pid"=74403 "profile"="my-recording-nginx-0-1636978341"
    I1115 12:12:30.029575   66106 bpfrecorder.go:512] bpf-recorder "msg"="Using short path via tracked mount namespace"  "mntns"=4026533352 "pid"=74402 "profile"="my-recording-nginx-0-1636978341"
    …
    

    Now it is time to run a test suite against our application. This will ensure that all necessary code paths have been executed and all system calls are part of the produced profile. How do you test a web server? By making a URL request against it and verifying the response:

    $ kubectl port-forward $(kubectl get pod -l app=nginx --no-headers -o custom-columns=:metadata.name) 8080 &
    Forwarding from 127.0.0.1:8080 -> 8080
    Forwarding from [::1]:8080 -> 8080
    $ curl localhost:8080
    Handling connection for 8080
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    …
    

    Stop the recording by removing the workload after the tests finish:

    $ kubectl delete -f deployment.yml
    

    The seccomp profile is now available as a custom resource. Due to Operator magic, it has been synchronized to every node within the cluster:

    $ kubectl get sp my-recording-nginx-0 -o yaml
    apiVersion: security-profiles-operator.x-k8s.io/v1alpha1
    kind: SeccompProfile
    metadata:
      creationTimestamp: "2021-11-15T12:07:38Z"
      finalizers:
      - ip-10-0-179-0.us-east-2.compute.internal-delete
      - ip-10-0-228-234.us-east-2.compute.internal-delete
      - ip-10-0-174-86.us-east-2.compute.internal-delete
      - ip-10-0-151-235.us-east-2.compute.internal-delete
      - ip-10-0-164-140.us-east-2.compute.internal-delete
      - ip-10-0-252-238.us-east-2.compute.internal-delete
      generation: 1
      name: my-recording-nginx-0
      namespace: security-profiles-operator
      resourceVersion: "53283"
      uid: e3538006-44c0-42c4-baa6-ededfdc60293
    spec:
      defaultAction: SCMP_ACT_ERRNO
      syscalls:
      - action: SCMP_ACT_ALLOW
        names:
        - accept4
        - access
        - arch_prctl
        …
        - writev
    status:
      conditions:
      - lastTransitionTime: "2021-11-15T12:07:42Z"
        reason: Available
        status: "True"
        type: Ready
      localhostProfile: operator/security-profiles-operator/my-recording-nginx-0.json
      status: Installed
    

    Add the profile to the securityContext of the container to use the profile with seccomp:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-deployment
    spec:
      selector:
        matchLabels:
          app: nginx-seccomp
      replicas: 3
      template:
        metadata:
          labels:
            app: nginx-seccomp
        spec:
          containers:
            - name: nginx
              image: nginxinc/nginx-unprivileged:1.21.4
              securityContext:
                seccompProfile:
                  type: Localhost
                  localhostProfile: operator/security-profiles-operator/my-recording-nginx-0.json
    

    Wow, that was quick--we recorded a custom seccomp profile over the course of a few minutes and are able to use it immediately!

    How does the BPF recorder work?

    This section helps you understand the capabilities and limitations of the BPF recorder.

    A highly portable BPF application

    The core of the BPF-based seccomp recorder is, as the name indicates, a small BPF application. For security reasons, the BPF program is loaded into the Linux kernel only if a recording has started. The program gets automatically unloaded if the recording stops and no other recording is started inside the cluster.

    To increase the portability of the Operator, we wrote a "compile once - run everywhere" (CO-RE) program using libbpf. The build result is embedded into the Operator and can be loaded directly from there. It supports amd64 and arm64 architectures.

    The program consults the vmlinux.h file to support older kernel versions that do not expose the required BPF Type Format (BTF). The build process of the Operator creates a custom generated BTF by using the bfthub project from Aqua Security. This project allows us to support more than 500 kernels that are too old or are not configured to expose their own BTF file. A custom continuous integration (CI) test ensures that the generated files are all up to date if the content of the BPF application changes.

    Control flow

    The basic control flow of the BPF recording mechanism involves the following components:

    • An independent webhook
    • Profile Recorder
    • BPF Recorder
    • BPF Program

    The webhook adds a profile recording annotation to the workload if the label selector matches. The other three components interact as shown in Figure 1.

    Each component of BPF recording communicates with the following components.
    Figure 1. Each component of BPF recording communicates with the following components.

    The Profile Reconciler sends data to the BPF Recorder via a gRPC UNIX domain socket. Data is sent when one or more recording annotations have been found on the target workload. Then the BPF Recorder loads the BPF Program, if that has not already been done. The BPF Program may already be loaded if multiple recordings are ongoing in parallel.

    The BPF Program attaches the sys_enter tracepoint, which is called for every process on the system before the invocation of any system call. This tracepoint allows the BPF Program to record every system call for every process ID (PID) in the kernel. If a PID that has not been seen before enters the tracepoint, the BPF Program throws an event into a predefined ring buffer, which gets analyzed by the BPF Recorder in its event processing routine.

    A time-critical action then takes place inside the event processor: Every new PID has to be analyzed by finding its possible container ID via its Control group (cgroup) path (which can be found in /proc/$PID/cgroup). If the container ID (consisting of 64 hexadecimal digits) has been found, the routine then tries to find that container within the cluster. Only when the container is inside the cluster and the corresponding profile recording annotations match does the event processor start tracking the profile.

    There is also a fast path that omits retrieval of all containers for every new PID within the cluster and reduces file system access during the recording. The mount namespace usually does not change within containers, so the program can use the mount namespace ID obtained by the BPF Program as an identifier to fast-track PIDs in containers that were found earlier on. This optimization results in log messages like:

    "msg"="Using short path via tracked mount namespace"  "mntns"=4026533352 "pid"=74403 "profile"="my-recording-nginx-0-1636978341"
    

    If the workload gets deleted or stops running, the Profile Reconciler tries to collect the system calls from the BPF Recorder via gRPC and unloads the BPF Program if no other recordings are running.

    The Profile Recorder then receives a unique list of system calls for all recorded PIDs within the container. Those system calls are reconciled into a new SeccompProfile resource afterward. The name of the new profile is prefixed with the recording name (my-recording in our example) and suffixed with the container name (nginx) as well as its replica (0) if it's coming from a ReplicaSet. In our case, this naming convention results in a recording called my-recording-nginx-0.

    There are some other implementation details not covered in this explanation: For example, the internal hash maps have to be cleaned up at certain points in time, and we use internal caches for the container IDs retrieved from the cgroup. The process has a limitation: It can't track very short-lived containers because it needs some time to look up the initial PID and correlate it to the profile annotation.

    Conclusion

    I hope in this article to bring you closer to the world of seccomp profile creation and how you can utilize eBPF within the Security Profiles Operator to simplify the workflow. Feel free to give the Operator a try and post a comment to this article if you have any questions.

    Last updated: September 20, 2023

    Related Posts

    • Receive Side Scaling (RSS) with eBPF and CPUMAP

    • Network debugging with eBPF (RHEL 8)

    • 3 steps toward improving container security

    • Five layers of security for Red Hat Data Grid on OpenShift

    • Introducing stapbpf - SystemTap's new BPF backend

    Recent Posts

    • How to use RHEL 10 as a WSL Podman machine

    • MINC: Fast, local Kubernetes with Podman Desktop & MicroShift

    • How to stay informed with Red Hat status notifications

    • Getting started with RHEL on WSL

    • llm-d: Kubernetes-native distributed inferencing

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue