Confidential containers (CoCo) bring confidential computing to cloud-native environments by enabling Kubernetes pods to run inside a Trusted Execution Environment (TEE). By standardizing confidential computing at the pod level, CoCo simplifies its adoption, allowing Kubernetes users to deploy confidential workloads using familiar workflows and tools—without requiring deep expertise in confidential computing technologies.
In the standard CoCo threat model, the cluster administrator is untrusted. As a result, any attempt to access a pod—whether through oc exec
, oc logs
, or similar commands—is also untrusted.
This raises a common question from administrators and developers. How can they debug a CoCo workload if they cannot directly access it?
The challenge is twofold. On the one hand, diagnosing issues in a container is crucial, mainly when an error occurs only in production, and you cannot reproduce it in a test environment. On the other hand, enabling debugging risks exposing sensitive data to the entities we aim to exclude from the trust model, undermining the core principles of confidential computing.
In this article, we will explore the implications of using confidential containers from a provider admin and developer perspective and propose approaches to debugging a CoCo environment.
OpenShift confidential containers for admins
Red Hat OpenShift sandboxed containers, built on Kata Containers, were designed with the capability to run confidential containers. The CNCF confidential containers project is the foundation for the Red Hat OpenShift CoCo solution.
For the sake of brevity, we won’t explain the whole CoCo architecture. Please refer to the various articles in our blog series to understand more about CoCo and the logic behind it.
Protecting data in use is based on two main concepts:
- The use of Trusted Execution Environments (TEE): A process running in a TEE is a process whose memory is encrypted by specific hardware processors. TEEs are isolated environments with enhanced security (e.g., runtime memory encryption, integrity protection), provided by confidential computing-capable hardware. This means nothing is able to read data in a TEE process memory, as everything will be encrypted. A special virtual machine (VM) called a confidential virtual machine (CVM) that executes inside the TEE is the foundation for the OpenShift CoCo solution. The CVM runs the pod.
- The use of the attestation process: Attestation ensures that a TEE and software (operating system, container image) are actually secure and untampered with. This step is extremely important, as it ensures that we can trust a confidential environment.
In the CoCo solution, the Trustee project provides the capability of attestation. It’s responsible for performing the attestation operations and delivering secrets (such as a decryption key) after successful attestation. Further, the CVM contains an embedded kata-agent policy engine that uses an allow and deny list of operations for the pod that runs inside it.
In OpenShift, you can easily deploy CoCo through two operators:
- OpenShift confidential containers: This feature, added to the OpenShift sandboxed containers operator, is responsible for deploying the building blocks required to deploy pods inside the TEE.
- Confidential compute attestation operator: This operator is responsible for deploying and managing the Trustee service in an OpenShift cluster that is responsible for providing remote attestation services to verify the trustworthiness of the TEEs.
Let’s look at the implications of CoCo for the platform administrators:
The cluster admin is not allowed to
exec
a program into the pod (commonly done viaexec
).By default, a cluster admin manages the containers and the cluster platform itself, therefore able to access the pods, look at their logs, start and stop them, etc. Figure 1 illustrates this.
Figure 1: A cluster admin is not capable of logging into the pod. Note that CoCo (and confidential computing in general) does not aim to prevent the admin from starting or stopping pods, but from reading the data within the pod. An attacker would still be able to stop a service, but they won’t be able to see what is running inside the pod.
If the infrastructure admin tries to dump the pod memory from the host, it will get only an encrypted blob or a zeroed file, depending on the hardware (Figure 2).
Figure 2: The infrastructure admin is not capable of physically/virtually dumping memory content from the CoCo. While this attack looks trivial from the physical point of view (unscrew the memory and inspect it with specialized tools), it is also surprisingly easy to do virtually if someone with privileged access like a provider administrator tries to do it. This is because a provider admin can access all processes running; therefore, a simple memory dump with debug tools is enough to grab what is running in memory.
The infrastructure admin will not be able to log into the pod's confidential virtual machine (commonly done via secure shell (SSH) or via serial console access), as any sort of access to the underlying CVM is disabled. This is shown in Figure 3.
Figure 3: An infrastructure admin is not capable of SSH-ing into the pod.
From a security perspective, this prevents any attacker from accessing our confidential workload.
However, if we think about the practical usage, such an approach seems to pose some limitations to debugging.
Debug confidential containers in production
A production workload would never fail or require inspection/debugging in a perfect world. The testing coverage would cover all possible cases, user interactions, and software environments, and a confidential container would never need to be inspected.
Unfortunately, this scenario is very far from reality. It is possible that testing does not cover some corner cases, or a multitude of factors (sometimes even a random combination) create issues for any workload, even in a production environment, regardless of whether it uses CoCo. Therefore, the workload owner needs to inspect what is happening with actual, live data in the pod.
Providing relevant access to the workload owner to debug poses a unique challenge in the context of CoCo. We want the workload owner to be able to debug a confidential container (i.e., by allowing exec
or access to the pod logs) without allowing anyone else access to the running pod or the data used inside the running pod.
How can we achieve this? We'll describe a few approaches you can use for debugging workloads in a CoCo environment without providing access to the cluster or infrastructure administrator.
Option 1: Deploy a new pod in a relaxed environment
This is the most trivial case. By taking advantage of the embedded Kata Agent Policy component present inside the CVM, it is possible to create a new confidential container with a more relaxed policy, allowing one to only observe the logs of a container (exec
still disallowed). The main advantage of this policy is they limit the access an admin has inside a pod.
Alternatively, taking advantage of the fact that CoCo is an extension of Kata, it is always possible to create a new non-CoCo pod (i.e., a regular Kata pod) and run it in the same or very similar environment to see what happens. But of course this has security implications, and it can hardly be used in production environments.
The main advantage of both these approaches is that they don’t need any change in the pod logic. The disadvantage is that they require a new pod to be created in the same (or similar) environment, so this is not helpful in environments where the bug cannot be reproduced.
Option 2: Expose logs/traces to an external server
This approach requires adding a sidecar to the pod, so the pod manifest needs to be modified. The idea is to create a sidecar with a logging agent inside the container so that if it fails, the sidecar will print the (redacted) logs and traces in a file to then be sent to an external server.
It is important to note that this approach does not circumvent the CoCo threat model. While access from the outside world into the pod is prohibited, nothing prevents the pod from connecting with the outside.
If the sidecar is injected automatically in the environment, then there will also be the need to create a custom kata-agent policy to allow such mutations. We'll look into a sample kata-agent policy later.
The main advantage of this approach is that there will be no policy relaxation, so the cluster admin still has no access to the pod. The trade-off is the additional work that needs to be done to add the sidecar, ensuring that the kata-agent policy is correct.
Option 3: Use sidecar with debugging tools
In some cases, just logging stack traces is not enough. Such debugging relies on the level of detail that the workload provides, and in some cases, whether it is even enabled to do so, as it is common practice to have multiple levels of logging. But since logs also affect performances, in a production environment they will be reduced to the minimum and just provide a partial picture of the application state.
Therefore, another important step in debugging is live inspection of workloads, enabling monitoring of the full stack process, memory/cpu usage.
In this scenario, we distinguish between two actors: the untrusted cluster administrator and the trusted developer (workload owner) who wants to debug the application. While the former should not be able to access the pod, the latter could be considered trusted in the sense that they should be able to inspect the workload to understand what is happening and fix it.
One possible solution for this scenario would be to implement a sidecar running a pre-configured SSH server with an embedded public key. This would allow only people with access to the private key to SSH into the pod. This approach also needs a policy to enable specific debug sidecar images with the relevant tools.
Another important point is how to provide the SSH public key to the sidecar container. This can be implemented by embedding it in the sidecar image. In this way, the SSH key is part of the container image, and no key can be added or removed.
The advantage of this SSH sidecar approach is that the developer now has full access to the workload and can use preferred tools to inspect and understand what went wrong. The main disadvantage is that it needs a careful architecture and permission model around the data, because otherwise, it would lead to data leakage.
Here is an example the Containerfile for the SSH server:
FROM quay.io/fedora/fedora:40
RUN dnf install -y less wget curl openssh-server
RUN mkdir /var/run/sshd
# The host public key fingerprint will be use to verify the connection
RUN ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -P ""
RUN sed -ri 's/^PermitRootLogin\s+.*/PermitRootLogin yes/' /etc/ssh/sshd_config \
&& sed -ri 's/UsePAM yes/#UsePAM yes/g' /etc/ssh/sshd_config \
&& sed -ri 's/#HostKey \/etc\/ssh\/ssh_host_ed25519_key/HostKey \/etc\/ssh\/ssh_host_ed25519_key/g' /etc/ssh/sshd_config \
&& sed -ri 's/^StrictModes\s+.*/StrictModes no/' /etc/ssh/sshd_config \
&& echo "PasswordAuthentication no" >> /etc/ssh/sshd_config \
&& echo "GatewayPorts yes" >> /etc/ssh/sshd_config
# Generate the key using 'ssh-keygen -t ed25519 -f debug-ssh -P "" -C ""'
COPY debug-ssh.pub /root/.ssh/authorized_keys
EXPOSE 22
CMD ["/usr/sbin/sshd", "-De"]
When you build your own SSH server container image, remember to note the SSH host fingerprint that prints. You can use this to verify the connection.
An example SSH server container image is available here.
The SSH host fingerprint is: SHA256:PwPXQLwarrwxW7tOVT6tAhEkpo/Nae2F+mH5oWqM6sE
The SSH private key for logging in to this image is as follows:
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAMwAAAAtzc2gtZW
QyNTUxOQAAACCKa0dSxLjUzpJXkh8WO4aNcOL6CXFrhGLuCgYJT9xU5wAAAIjuh5ds7oeX
bAAAAAtzc2gtZWQyNTUxOQAAACCKa0dSxLjUzpJXkh8WO4aNcOL6CXFrhGLuCgYJT9xU5w
AAAEDgU61Cg/hRYswwQ2oc12bUY9ApvLKMrD47g3jFypbZG4prR1LEuNTOkleSHxY7ho1w
4voJcWuEYu4KBglP3FTnAAAAAAECAwQF
-----END OPENSSH PRIVATE KEY-----
Kata-agent policy
To avoid any malicious sidecars attaching to the workload, we will need to use the Kata agent policy mechanism to allow only specific sidecar images. Additionally, for SSH sidecars, any modifications to the embedded SSH keys via volume mounts must be disabled to avoid misuse by overriding the keys using volume mounts. The following is an example of such a kata-agent policy using the example SSH container image and a Fedora image:
algorithm = "sha384"
version = "0.1.0"
[data]
"policy.rego" = '''
package agent_policy
import future.keywords.in
import future.keywords.if
import future.keywords.every
default AddARPNeighborsRequest := true
default AddSwapRequest := true
default CloseStdinRequest := true
default CreateSandboxRequest := true
default DestroySandboxRequest := true
default GetMetricsRequest := true
default GetOOMEventRequest := true
default GuestDetailsRequest := true
default ListInterfacesRequest := true
default ListRoutesRequest := true
default MemHotplugByProbeRequest := true
default OnlineCPUMemRequest := true
default PauseContainerRequest := true
default PullImageRequest := true
default RemoveContainerRequest := true
default RemoveStaleVirtiofsShareMountsRequest := true
default ReseedRandomDevRequest := true
default ResumeContainerRequest := true
default SetGuestDateTimeRequest := true
default SetPolicyRequest := true
default SignalProcessRequest := true
default StartContainerRequest := true
default StartTracingRequest := true
default StatsContainerRequest := true
default StopTracingRequest := true
default TtyWinResizeRequest := true
default UpdateContainerRequest := true
default UpdateEphemeralMountsRequest := true
default UpdateInterfaceRequest := true
default UpdateRoutesRequest := true
default WaitProcessRequest := true
default WriteStreamRequest := true
default CopyFileRequest := false
default ReadStreamRequest := false
default ExecProcessRequest := false
default CreateContainerRequest := false
CopyFileRequest if {
not exists_disabled_path
}
exists_disabled_path {
some disabled_path in policy_data.disabled_paths
contains(input.path, disabled_path)
}
CreateContainerRequest if {
every storage in input.storages {
some allowed_image in policy_data.allowed_images
storage.source == allowed_image
}
}
policy_data := {
"disabled_paths": [
"ssh",
"authorized_keys",
"sshd_config"
],
"allowed_images": [
"pause",
"quay.io/confidential-devhub/ssh-server@sha256:3f6cf765ff47a8b180272f1040ab713e08332980834423129fbce80269cf7529",
"quay.io/fedora/fedora:41",
]
}
'''
For your environment, you must change the allowed_images
list to use your image. The example policy above shows two ways to provide the allowed images—either using digest or with a tag.
Let's now go through a complete example.
Assume the pod is named coco-app
. Using the following command, a debug pod can be created, which will be the same as the original pod but with an additional sidecar running the SSH server:
oc debug --image=quay.io/confidential-devhub/ssh-server@sha256:3f6cf765ff47a8b180272f1040ab713e08332980834423129fbce80269cf7529 --share-processes=true --copy-to=debug-coco-app coco-app
Or, you can add the SSH container as a sidecar in the primary pod manifest itself.
Here is a simple hello-world example. The cc_init_data
annotation is used to specify the kata-agent policy. It's base64 encoded:
---
apiVersion: v1
kind: Pod
metadata:
name: hello-world
labels:
app: app
annotations:
io.katacontainers.config.runtime.cc_init_data: YWxnb3JpdGhtID0gInNoYTM4NCIKdmVyc2lvbiA9ICIwLjEuMCIKCltkYXRhXQoicG9saWN5LnJlZ28iID0gJycnCnBhY2thZ2UgYWdlbnRfcG9saWN5CgppbXBvcnQgZnV0dXJlLmtleXdvcmRzLmluCmltcG9ydCBmdXR1cmUua2V5d29yZHMuaWYKaW1wb3J0IGZ1dHVyZS5rZXl3b3Jkcy5ldmVyeQoKZGVmYXVsdCBBZGRBUlBOZWlnaGJvcnNSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBBZGRTd2FwUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgQ2xvc2VTdGRpblJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IENyZWF0ZVNhbmRib3hSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBEZXN0cm95U2FuZGJveFJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IEdldE1ldHJpY3NSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBHZXRPT01FdmVudFJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IEd1ZXN0RGV0YWlsc1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IExpc3RJbnRlcmZhY2VzUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgTGlzdFJvdXRlc1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IE1lbUhvdHBsdWdCeVByb2JlUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgT25saW5lQ1BVTWVtUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgUGF1c2VDb250YWluZXJSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBQdWxsSW1hZ2VSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBSZWFkU3RyZWFtUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgUmVtb3ZlQ29udGFpbmVyUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgUmVtb3ZlU3RhbGVWaXJ0aW9mc1NoYXJlTW91bnRzUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgUmVzZWVkUmFuZG9tRGV2UmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgUmVzdW1lQ29udGFpbmVyUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgU2V0R3Vlc3REYXRlVGltZVJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFNldFBvbGljeVJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFNpZ25hbFByb2Nlc3NSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBTdGFydENvbnRhaW5lclJlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFN0YXJ0VHJhY2luZ1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFN0YXRzQ29udGFpbmVyUmVxdWVzdCA6PSB0cnVlCmRlZmF1bHQgU3RvcFRyYWNpbmdSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBUdHlXaW5SZXNpemVSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBVcGRhdGVDb250YWluZXJSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBVcGRhdGVFcGhlbWVyYWxNb3VudHNSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBVcGRhdGVJbnRlcmZhY2VSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBVcGRhdGVSb3V0ZXNSZXF1ZXN0IDo9IHRydWUKZGVmYXVsdCBXYWl0UHJvY2Vzc1JlcXVlc3QgOj0gdHJ1ZQpkZWZhdWx0IFdyaXRlU3RyZWFtUmVxdWVzdCA6PSB0cnVlCgpkZWZhdWx0IENvcHlGaWxlUmVxdWVzdCA6PSBmYWxzZQpkZWZhdWx0IFJlYWRTdHJlYW1SZXF1ZXN0IDo9IGZhbHNlCmRlZmF1bHQgRXhlY1Byb2Nlc3NSZXF1ZXN0IDo9IGZhbHNlCmRlZmF1bHQgQ3JlYXRlQ29udGFpbmVyUmVxdWVzdCA6PSBmYWxzZQoKQ29weUZpbGVSZXF1ZXN0IGlmIHsKICAgICBzb21lIGRpc2FibGVkX3BhdGggaW4gcG9saWN5X2RhdGEuZGlzYWJsZWRfcGF0aHMKICAgICBub3QgY29udGFpbnMoaW5wdXQucGF0aCwgZGlzYWJsZWRfcGF0aCkKfQoKQ3JlYXRlQ29udGFpbmVyUmVxdWVzdCBpZiB7CglldmVyeSBzdG9yYWdlIGluIGlucHV0LnN0b3JhZ2VzIHsKICAgICAgICBzb21lIGFsbG93ZWRfaW1hZ2UgaW4gcG9saWN5X2RhdGEuYWxsb3dlZF9pbWFnZXMKICAgICAgICBzdG9yYWdlLnNvdXJjZSA9PSBhbGxvd2VkX2ltYWdlCiAgICB9Cn0KCgpwb2xpY3lfZGF0YSA6PSB7CiAgICAgICAgImRpc2FibGVkX3BhdGhzIjogWwogICAgICAgICAgICAgICAic3NoIiwKICAgICAgICAgICAgICAgImF1dGhvcml6ZWRfa2V5cyIsCiAgICAgICAgICAgICAgICJzc2hkX2NvbmZpZyIKICAgICAgICBdLAoKCiAgICAgICAgImFsbG93ZWRfaW1hZ2VzIjogWwogICAgICAgICAgICAgICAgInBhdXNlIiwKCQkicXVheS5pby9jb25maWRlbnRpYWwtZGV2aHViL3NzaC1zZXJ2ZXJAc2hhMjU2OjNmNmNmNzY1ZmY0N2E4YjE4MDI3MmYxMDQwYWI3MTNlMDgzMzI5ODA4MzQ0MjMxMjlmYmNlODAyNjljZjc1MjkiLAogICAgICAgICAgICAgICAgInF1YXkuaW8vZmVkb3JhL2ZlZG9yYTo0MSIsCiAgICAgICAgXQp9CgonJycK
spec:
runtimeClassName: kata-remote
shareProcessNamespace: true
containers:
- name: ssh
image: "quay.io/confidential-devhub/ssh-server@sha256:3f6cf765ff47a8b180272f1040ab713e08332980834423129fbce80269cf7529"
- name: app
image: "quay.io/fedora/fedora:41"
command:
- sleep
- "36000"
Expose the SSH port to access it from an external client. You can either use LoadBalancer
or NodePort
based on your setup. You might also need to enable firewall settings to allow SSH connection:
oc expose pod hello-world --port=22 --target-port=22 --type=LoadBalancer
You can SSH to the external IP assigned to the service by using the private key and confirming that the host key fingerprint is the same as mentioned previously for the sample SSH image.
Now, if the pod manifest is maliciously mutated to modify the SSH config—like adding another key via a volumeMount
—then the policy will block it and you'll see a message when describing the pod, similar to the following:
failed to create shim task: "CopyFileRequest is blocked by policy: "
In the example, we provide the policy via init_data
annotation. The contents of this annotation are measured as part of remote attestation process to ensure the expected policy is available to the pod. Another option is to embed the policy in the VM image.
Option 4: Use a debug CVM image with embedded SSH public key
Sometimes it also helps to debug the CVM that runs the workload. There are several reasons for this, such as attaching a debugger, seeing journal logs, or inspecting the pod namespace using nsenter.
You can create a debug CVM image with relevant tools, SSH server, and embed an SSH public key to enable remote SSH access to the CVM.
Usually, external SSH access to VMs in the cloud will be disabled as a security measure. However, you can spin up a basic SSH client CoCo pod in the same cluster, get your SSH private keys into this CoCo pod, and connect to the debug CVM from this pod.
Note how the connection instantiates from a CoCo pod. This is because it will prevent the SSH private key from a malicious actor, guaranteeing that only the owner of the SSH client pod is able to log into the debug CVM.
Once a CVM debug image has been implemented, the only change required for OpenShift sandboxed containers 1.8.0 or previous versions is to update the image reference in the OpenShift sandboxed containers operator ConfigMap.
For example, in Azure, the field is called AZURE_IMAGE_ID
and should reference the Azure gallery where the image is uploaded.
Once the field is updated, just restart the DaemonSet to make it sure it uses the new image:
oc set env ds/peerpodconfig-ctrl-caa-daemon -n openshift-sandboxed-containers-operator REBOOT="$(date)"
After the DaemonSet is updated, all the new CoCo deployments will run using this new image. In a production environment one must be very careful to immediately reset the image to the previous value once the CoCo debug application is deployed.
Starting from OpenShift sandboxed containers 1.9.0 the image used by a pod can also be selected via annotations. This means that there is no need to change the ConfigMap or update the DaemonSet—just add the image under the pod YAML annotation io.katacontainers.config.hypervisor.image
and the cloud-api-adaptor will use that specific image for that single deployment.
For example:
apiVersion: v1
kind: Pod
metadata:
name: <your-name>
labels:
app: <your-app>
annotations:
io.katacontainers.config.hypervisor.image: "image path (aws ami id, azure image gallery, etc)"
[...]
For more information on how to select a specific CVM image and all the available settings in OpenShift sandboxed containers, refer to the official documentation.
Summary
In this article, we looked at the practical implications of using confidential containers, how they protect the workload from a possible malicious admin, and also how they can be used in practice when it comes to debugging the workload without compromising the CoCo threat model.
We explored the advantages and disadvantages at four different solutions that can be deployed to securely debug workload deployed as confidential containers. Such understanding is vital for a developer or an administrator because it answers a common question in this field and facilitates the adoption of this new technology to improve the security posture of your workloads.