SELinux Multi-Category Security (MCS) breaks GitLab runner's multi-container job architecture by assigning different labels to containers that need to share volumes. This article describes this problem, as well as GNOME's current workaround with fixed MCS labels, and the path toward ephemeral microVM isolation using Cloud Hypervisor, Firecracker, and libkrun.
GNOME's (a well-known Red Hat community project) GitLab runners use Podman as the container runtime with SELinux in enforcing mode on Fedora. The GitLab runner Docker/Podman executor spawns multiple containers per job: a helper container that clones the repository and handles artifacts and a build container that runs the actual CI script. Both containers need to share a /builds volume, and this is where SELinux's Multi-Category Security (MCS) becomes a problem.
The MCS problem
An SELinux label has four fields: user:role:type:level. For containers, the interesting part is the level, also called the MCS field. A level looks like s0:c123,c456, and s0 is the sensitivity (always s0 in targeted policy), and the categories are c123,c456. A process or file can carry up to two categories.
MCS access is based on dominance. A subject's label dominates an object's label if the subject's categories are a superset of (or equal to) the object's categories, as shown in the following table.
| Subject | Object | Access | Why |
|---|---|---|---|
s0:c100,c200 | s0:c100,c200 | Yes | Exact match |
s0:c100,c200 | s0:c100 | Yes | Subject's categories are a superset |
s0:c100,c200 | s0:c100,c300 | No | Subject lacks c300 |
s0:c0.c1023 | s0:c100,c200 | Yes | Full range dominates everything |
s0 | s0:c100,c200 | No | No categories can't dominate any |
s0 | s0 | Yes | Both have no categories |
This applies to the runners as follows:
- Container A runs as
container_t:s0:c100,c100: It can only access objects labeleds0:c100,c100(ors0:c100, ors0). - Container B runs as
container_t:s0:c200,c200: it can only access objects labeleds0:c200,c200(ors0:c200, ors0). - Container A cannot access Container B's files:
c100,c100doesn't dominatec200,c200 - Overlay layers labeled
s0(no categories): Accessible by all containers since every category set dominates the empty set - Podman at
container_runtime_t:s0-s0:c0.c1023: The full range means it dominates every possible category combination, so it can manage all containers.
They use range syntax (s0-s0:c0.c1023) for processes that need to operate across multiple levels. It means "my low clearance is s0 and my high clearance is s0:c0.c1023." The process can read objects at any level within that range and create objects at any level within it. This is why Podman needs the full range because it creates containers with different MCS labels and needs to access all of them.
When Podman starts a container, it picks a random pair of categories (e.g., s0:c512,c768) within its allowed range and assigns that as the container's process label. Files created by the container inherit that label. Another container gets a different random pair (e.g., s0:c33,c901). Since c512,c768 and c33,c901 do not match because neither is a superset of the other, SELinux denies cross-container file access. This is the isolation mechanism and the root cause of the problem with GitLab runner's multi-container, per-job architecture.
The helper container gets one random MCS pair, writes the cloned repo to /builds labeled with that pair, and the build container gets a different pair. The build container cannot read or write those files. The :Z volume flag (exclusive relabel) relabels the volume to the mounting container's category, but that only helps the first container. The second container still has a different label.
The test script
I wrote the following script that demonstrates the problem with standard containers (crun) and microVMs (libkrun). This script creates two containers per test: a helper that writes a file to a shared /builds volume and a build container that tries to read it, simulating the GitLab runner workflow.
#!/bin/bash
# Description: SELinux MCS Diagnostic (crun vs krun)
if [ "$(getenforce)" != "Enforcing" ]; then
echo "WARNING: SELinux is not in Enforcing mode. This test requires Enforcing mode."
exit 1
fi
TEST_BASE="/tmp/gitlab-runner-mcs-test"
CRUN_DIR="$TEST_BASE/crun-builds"
KRUN_DIR="$TEST_BASE/krun-builds"
# Cleanup from previous runs
rm -rf "$TEST_BASE"
mkdir -p "$CRUN_DIR" "$KRUN_DIR"
echo "======================================================="
echo " TEST 1: Standard Container Isolation (crun)"
echo "======================================================="
# 1. CREATE Helper
podman create --name crun-helper -v "$CRUN_DIR:/builds:Z" fedora bash -c "
echo '[crun] -> Helper Process Context (Inside):'
cat /proc/self/attr/current
echo 'crun-data' > /builds/artifact.txt
echo '[crun] -> File Label INSIDE Helper:'
ls -Z /builds/artifact.txt
" > /dev/null
echo "[crun] Starting Helper Container (applying :Z relabel)..."
HELPER_HOST_LABEL_CRUN=$(podman inspect -f '{{.ProcessLabel}}' crun-helper)
echo "[crun] -> HOST METADATA: Podman assigned process label: $HELPER_HOST_LABEL_CRUN"
podman start -a crun-helper
echo ""
echo "[crun] -> File Label ON HOST (Notice the specific MCS category):"
ls -Z "$CRUN_DIR/artifact.txt"
# 2. CREATE Build Container (The Victim)
podman create --name crun-build -v "$CRUN_DIR:/builds" fedora bash -c "
echo ' [Build-Internal] Process Context:'
cat /proc/self/attr/current 2>/dev/null
echo ' [Build-Internal] Executing ls -laZ /builds :'
ls -laZ /builds 2>&1 | sed 's/^/ /'
echo ' [Build-Internal] Executing cat /builds/artifact.txt :'
cat /builds/artifact.txt 2>&1 | sed 's/^/ /'
" > /dev/null
echo ""
echo "[crun] Starting Build Container to inspect shared volume..."
BUILD_HOST_LABEL_CRUN=$(podman inspect -f '{{.ProcessLabel}}' crun-build)
echo "[crun] -> HOST METADATA: Podman assigned process label: $BUILD_HOST_LABEL_CRUN"
podman start -a crun-build
podman rm -f crun-helper crun-build > /dev/null
echo ""
echo "======================================================="
echo " TEST 2: MicroVM Isolation (libkrun / virtio-fs)"
echo "======================================================="
# --- Write the execution scripts to the host to avoid parsing errors ---
cat << 'EOF' > "$TEST_BASE/krun_helper.sh"
#!/bin/bash
echo '[krun] -> Helper Process Context (Inside VM):'
cat /proc/self/attr/current 2>/dev/null || echo ' (SELinux disabled/unavailable in guest kernel)'
echo 'krun-data' > /builds/artifact.txt
echo '[krun] -> File Label INSIDE Helper VM (Blindspot):'
ls -laZ /builds/artifact.txt 2>&1 | sed 's/^/ /'
EOF
cat << 'EOF' > "$TEST_BASE/krun_build.sh"
#!/bin/bash
echo ' [Build-Internal] Process Context (Inside VM):'
cat /proc/self/attr/current 2>/dev/null || echo ' (SELinux disabled/unavailable in guest kernel)'
echo ' [Build-Internal] Executing ls -laZ /builds :'
ls -laZ /builds 2>&1 | sed 's/^/ /'
echo ' [Build-Internal] Executing cat /builds/artifact.txt :'
cat /builds/artifact.txt 2>&1 | sed 's/^/ /'
EOF
chmod +x "$TEST_BASE/krun_helper.sh" "$TEST_BASE/krun_build.sh"
# ---------------------------------------------------------------------
# 1. CREATE Helper MicroVM
podman create --name krun-helper --runtime krun --memory=1024m \
-v "$KRUN_DIR:/builds:Z" \
-v "$TEST_BASE/krun_helper.sh:/script.sh:ro,Z" \
fedora /script.sh > /dev/null
echo "[krun] Starting Helper MicroVM (applying :Z relabel)..."
HELPER_HOST_LABEL_KRUN=$(podman inspect -f '{{.ProcessLabel}}' krun-helper)
echo "[krun] -> HOST METADATA: Podman assigned process label: $HELPER_HOST_LABEL_KRUN"
podman start -a krun-helper
echo ""
echo "[krun] -> File Label ON HOST (Podman applied the helper's MCS category via :Z):"
ls -Z "$KRUN_DIR/artifact.txt"
# 2. CREATE Build MicroVM (The Victim)
podman create --name krun-build --runtime krun --memory=1024m \
-v "$KRUN_DIR:/builds" \
-v "$TEST_BASE/krun_build.sh:/script.sh:ro,Z" \
fedora /script.sh > /dev/null
echo ""
echo "[krun] Starting Build MicroVM to inspect shared volume..."
BUILD_HOST_LABEL_KRUN=$(podman inspect -f '{{.ProcessLabel}}' krun-build)
echo "[krun] -> HOST METADATA: Podman assigned process label: $BUILD_HOST_LABEL_KRUN"
echo " *** THE virtiofsd DAEMON ON THE HOST IS TRAPPED IN THIS CONTEXT ***"
podman start -a krun-build
# Cleanup
podman rm -f krun-helper krun-build > /dev/null
echo ""
echo "======================================================="
echo " Test Complete."
Test 1 (crun) creates a helper container that mounts the builds directory with :Z (exclusive relabel) and writes artifact.txt. Podman assigns it a random MCS label. In this run, it was s0:c20,c540. The file on disk inherits that label. Then a second container (the build container) mounts the same path without :Z and gets a different random label (s0:c46,c331). Since c46,c331 does not dominate c20,c540, it denies the build container's access to the file.
Test 2 (krun) runs the same scenario with --runtime krun, which boots each container inside a lightweight microVM via libkrun. The helper VM gets container_kvm_t:s0:c823,c999, and the build VM gets container_kvm_t:s0:c309,c405—same MCS mismatch, same denial. The type changes from container_t to container_kvm_t, but the MCS mechanism is identical. On the host side, the virtiofsd daemon that serves the volume into the VM via virtio-fs runs under the MCS label Podman assigned to the VM. It traps the build VM's virtiofsd in s0:c309,c405 and therefore cannot access files labeled s0:c823,c999.
An interesting detail: inside the libkrun VMs, cat /proc/self/attr/current returns kernel. SELinux is not available in the guest. The VM thinks it has no mandatory access control, but the host-side virtiofsd is still fully subject to MCS enforcement. This is a blindspot worth noting.
The output from a run on Fedora with SELinux enforcing and Podman 5.8.2 follows:
=======================================================
TEST 1: Standard Container Isolation (crun)
=======================================================
[crun] Starting Helper Container (applying :Z relabel)...
[crun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_t:s0:c20,c540
[crun] -> Helper Process Context (Inside):
system_u:system_r:container_t:s0:c20,c540 [crun] -> File Label INSIDE Helper:
system_u:object_r:container_file_t:s0:c20,c540 /builds/artifact.txt
[crun] -> File Label ON HOST (Notice the specific MCS category):
system_u:object_r:container_file_t:s0:c20,c540 /tmp/gitlab-runner-mcs-test/crun-builds/artifact.txt
[crun] Starting Build Container to inspect shared volume...
[crun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_t:s0:c46,c331
*** COMPARE THE cXXX,cYYY ABOVE TO THE FILE LABEL. THIS MISMATCH CAUSES THE DENIAL ***
[Build-Internal] Process Context:
system_u:system_r:container_t:s0:c46,c331 [Build-Internal] Executing ls -laZ /builds :
ls: cannot open directory '/builds': Permission denied
[Build-Internal] Executing cat /builds/artifact.txt :
cat: /builds/artifact.txt: Permission denied
=======================================================
TEST 2: MicroVM Isolation (libkrun / virtio-fs) FIXED
=======================================================
[krun] Starting Helper MicroVM (applying :Z relabel)...
[krun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_kvm_t:s0:c823,c999
[krun] -> Helper Process Context (Inside VM):
kernel [krun] -> File Label INSIDE Helper VM (Blindspot):
-rw-r--r--. 1 root root system_u:object_r:container_file_t:s0:c823,c999 10 May 2 2026 /builds/artifact.txt
[krun] -> File Label ON HOST (Podman applied the helper's MCS category via :Z):
system_u:object_r:container_file_t:s0:c823,c999 /tmp/gitlab-runner-mcs-test/krun-builds/artifact.txt
[krun] Starting Build MicroVM to inspect shared volume...
[krun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_kvm_t:s0:c309,c405
*** THE virtiofsd DAEMON ON THE HOST IS TRAPPED IN THIS CONTEXT ***
[Build-Internal] Process Context (Inside VM):
kernel [Build-Internal] Executing ls -laZ /builds :
ls: /builds: Permission denied
ls: cannot open directory '/builds': Permission denied
[Build-Internal] Executing cat /builds/artifact.txt :
cat: /builds/artifact.txt: Permission denied
=======================================================
Test Complete.
GitLab's official suggestion falls short
GitLab's documentation on configuring SELinux MCS suggests applying the same MCS label to all containers launched by a runner:
[[runners]]
[runners.docker]
security_opt = ["label=level:s0:c1000,c1000"]This works. All containers get the same category pair, so the helper and build containers can share files. But it collapses MCS isolation between all concurrent jobs on that runner. With concurrent = 4, four simultaneous jobs run as s0:c1000,c1000 and can read each other's /builds content (cloned source code, build artifacts, and cached dependencies). On a shared or multi-tenant runner, this is a security regression, trading MCS isolation for functionality.
For runners with concurrent = 1 or dedicated single-tenant runners, this is an acceptable tradeoff. But it does not generalize to shared infrastructure where multiple untrusted projects run side by side.
The GNOME workaround
An Ansible Role manages GNOME's runners. It enforces SELinux in enforcing mode, installs rootless Podman running as a dedicated podman system user with linger enabled, and deploys custom SELinux policy modules. The Podman service runs under SELinuxContext=system_u:system_r:container_runtime_t:s0-s0:c0.c1023 via a systemd override. The full MCS range (s0-s0:c0.c1023) gives the container runtime the ability to spawn containers at any MCS level and relabel volumes accordingly, as explained in the dominance rules.
There are four custom SELinux .te modules compiled and loaded on every runner host:
pydocuum: Allows the image cleanup daemon to talk to the Podman socketpodman: Grantsuser_namespace createand/dev/nullmappingflatpak: Permits the filesystem mounts needed by the flatpak buildsgnome_runner: Coversbinfmt_miscaccess, device nodes, and other permissions GNOME OS builds require
For the MCS problem specifically, the runner config.toml, rendered from a Jinja2 template via per-host Ansible variables, sets a fixed MCS label per runner type. Here's a representative snippet from one of the runner hosts:
[[runners]]
name = "a15948139c78"
executor = "docker"
[runners.docker]
image = "quay.io/fedora/fedora:latest"
privileged = false
security_opt = ["label=level:s0:c100,c100"]
devices = ["/dev/kvm", "/dev/udmabuf"]
cap_add = ["SYS_PTRACE", "SYS_CHROOT"]
[[runners]]
name = "a15948139c78-flatpak"
executor = "docker"
[runners.docker]
image = "quay.io/gnome_infrastructure/gnome-runtime-images:gnome-master"
privileged = false
security_opt = ["seccomp:/home/podman/gitlab-runner/flatpak.seccomp.json", "label=level:s0:c200,c200"]
cap_drop = ["all"]
This is the same approach GitLab's documentation suggests with one refinement, we use different fixed categories per runner type: c100,c100 (for untagged runners) and c200,c200 (for flatpak runners). Thus, flatpak builds and regular builds remain MCS-isolated from each other, even though builds of the same type share a category.
This is a pragmatic compromise, not an ideal solution. All concurrent jobs on the same runner type share the same MCS category. With concurrent: 4 on our Hetzner runners, four simultaneous untagged jobs can read each other's /builds content. For GNOME's use case (a community CI infrastructure where the runners are shared by GNOME project maintainers), this is an acceptable tradeoff. The alternative of leaving MCS labels random, would break every single job. But it is precisely this tradeoff that motivates exploring per-job VM isolation via microVMs.
Explore libkrun
The libkrun is a lightweight virtual machine monitor (VMM) that integrates with Podman via --runtime krun, running each container inside a microVM with a lightweight kernel. The appeal is strong because per-container VM isolation would give each job a kernel and address space, making the MCS cross-container problem irrelevant inside the VM.
I tested libkrun on a Fedora system and hit an immediate blocker: Fatal glibc error: rseq registration failed. Introduced in Linux kernel 5.3, the restartable sequences (rseq) syscall is required by glibc >= 2.35. The libkrun uses a custom minimal kernel that does not expose rseq support. Since the guest images (Fedora in our case) ship modern glibc that expects rseq to be available, the process aborts at startup before any user code runs.
The system compiles the libkrun kernel into the library, and the user cannot modify or replace it. This is not a configuration issue but a fundamental limitation of the current libkrun release. In relation to this, an RFE opened upstream for libkrun to start support to boot custom kernels and to enable passing in additional customizations via krun_vm.json. The idea is to potentially build our own kernel based on the latest Fedora kernel plus a set of required kernel modules and have libkrun bootstrap a microVM for every CI job with minimal to no changes to the current GNOME's CI infrastructure.
Even if we resolved the rseq issue (by booting a customized kernel as we just described), the MCS challenge would still be there, as the test script demonstrates in Test 2. On the host side, Podman assigns MCS labels to the virtiofsd process that serves the volume into the VM via virtio-fs. Different VMs get different host-side MCS labels, meaning the same :Z relabel / cross-container access denial applies. The mechanism changes from overlay mounts to virtio-fs, but the SELinux enforcement is identical. The virtiofsd for the build VM runs at container_kvm_t:s0:c309,c405 and cannot access files labeled s0:c823,c999 by the helper VM's virtiofsd.
Firecracker and the custom executor path
Firecracker is another microVM technology (the one behind AWS Lambda and Fly.io) that could provide strong per-job isolation. However, there is no native GitLab runner executor for Firecracker. The only integration path is the custom executor, which requires implementing prepare, run, and cleanup scripts from scratch.
The CUSTOM_ENV_CI_JOB_IMAGE exposes the job image. But everything else is on the operator, pulling the OCI image, extracting a rootfs, booting a Firecracker VM with the right kernel and network configuration, injecting the build script, mounting or copying the cloned repository into the VM, collecting artifacts and cache after the job finishes, and tearing the VM down.
GitLab provides an LXD-based example that shows the pattern: prepare creates a container and installs dependencies, run pipes the job script into it, cleanup destroys it. But adapting that to microVMs adds the complexity of VM lifecycle management, kernel and rootfs preparation, networking, and storage. This is a significant engineering effort, essentially rebuilding the entire Docker executor workflow from scratch.
What comes next
MCS is a core SELinux feature. Type enforcement (TE) already confines processes by type. The container_t can only access container_file_t, not user_home_t or httpd_sys_content_t. But TE alone cannot distinguish one container_t process from another. MCS adds that layer by assigning each container a unique category pair. The kernel enforces isolation between processes that share the same type. Container A at s0:c100,c100 and Container B at s0:c200,c200 are container_t, but MCS ensures they cannot touch each other's files. The conflict with GitLab runner's multi-container, per-job architecture involves two containers that need to share a volume given different categories by default. The workarounds we deploy today, including the fixed MCS labels on GNOME's runners, trade that inter-container isolation for functionality.
The most promising direction I've found so far is the combination of Cloud Hypervisor and the fleeting-plugin-fleetingd plug-in. Cloud Hypervisor, built on Intel's Rust-VMM crate, is essentially a more capable sibling of Firecracker. It supports CPU and memory hotplugging, VFIO device passthrough, and virtio-fs. These features are often necessary for complex CI tasks like building large binaries or running UI tests and deliberately omitted by Firecracker's minimalist design.
The fleeting-plugin-fleetingd is a community plug-in for GitLab's instance executor (the modern evolution of the custom executor) that automates the full VM lifecycle which includes downloading cloud images, creating Copy-on-Write disks, launching Cloud Hypervisor VMs with direct kernel boot, provisioning them via cloud-init, and tearing them down after each build. Each job gets a fresh disposable VM, which is exactly the per-job isolation model we need. The plug-in already handles networking via TAP interfaces and nftables SNAT and supports customization of the VM image through cloud-init commands, making preinstallation of Podman or other build tools straightforward.
Beyond that, I'll also keep evaluating libkrun (promising Red Hat technology), Firecracker with a hand-rolled custom executor, and QEMU's microvm machine type. The common denominator across all of these technologies, with the exception of the fleeting-plugin-fleetingd path, is that none of them have an existing GitLab runner integration. Regardless of which microVM technology we settle on, the path forward involves either building a workflow from scratch using the custom executor and its prepare, run, cleanup hooks or leveraging the fleeting plug-in ecosystem that GitLab has been building around the Instance and Docker Autoscaler executors.
The CVE-2026-31431 bug
The urgency of per-job VM isolation was due to CVE-2026-31431 (Copy Fail), a nine-year-old logic bug in the kernel's algif_aead cryptographic module disclosed at the end of April. This flaw allows an unprivileged local user write four controlled bytes into the page cache of any readable file—enough to patch a setuid binary like /usr/bin/su and escalate to root. Unlike Dirty Cow or Dirty Pipe, Copy Fail requires no race condition. The exploit is deterministic, leaves no trace on disk, and can break out of container isolation. In a shared-runner CI environment, any project that can execute arbitrary code in a job already has the access the exploit needs.
Separately, Claude Mythos was an Anthropic model trained for cybersecurity research that escaped its sandbox during a red-team exercise in April. This demonstrated that AI-assisted vulnerability discovery and exploitation is no longer theoretical. Models can now autonomously find and chain bugs that would take human researchers weeks to exploit. The combination of a reliable, public kernel LPE and AI-augmented offensive tooling makes the case for ephemeral microVMs compelling. When every CI job boots a fresh, disposable VM with its own kernel, a vulnerability like Copy Fail becomes a local-root inside a throwaway guest destroyed seconds later, not a stepping stone to the host or adjacent jobs.
Final thoughts
Ultimately, the conflict between SELinux MCS and the GitLab runner's multi-container model reveals a critical conflict between security and functionality in containerized CI environments. While pragmatic workarounds, such as GNOME's use of fixed MCS labels, allow jobs to complete, they compromise the necessary isolation between concurrent builds on shared infrastructure. The only way to maintain the strong inter-job security SELinux MCS was designed to provide is to move beyond conventional containers.
MicroVM technologies like Cloud Hypervisor, especially when paired with the fleeting-plugin-fleetingd for full lifecycle automation, represent the future of secure CI. By provisioning every job with a fresh, disposable virtual machine, we not only solve the volume sharing conflict but also neutralize threats like CVE-2026-31431. The disposable VM comes with the gitlab-runner binary responsible for pulling the latest sources, cache restoration and artifact uploads; the main difference is a single microVM handles the full CI job lifecycle without the need to spawn intermediate and isolated containers to perform preparatory CI steps. This approach confines any potential local privilege escalation to a temporary, isolated guest, ensuring that strong per-job isolation becomes the default security standard for all shared GitLab runner infrastructure.