Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Introducing a *Super* Privileged Container Concept

November 6, 2014
Daniel Walsh
Related topics:
ContainersSecurity
Related products:
Red Hat OpenShift Container Platform

Share:

    Letting the containers out of containmentpadlock

    I have written a lot about *Containing the Containers*, e.g. *Are Docker containers really secure?* and *Bringing new security features to Docker*. However, what if you want to ship a container that needs to have access to the host system or other containers? Well, let's talk about removing all the security! Safely?

     

    Packaging Model

    I envision a world where lots of software gets shipped in image format. In other words, the application brings all of the content needed to do its job with it, including the shared libraries, and specific versions of python, ruby, glibc ... There are two big benefits with this. One: the application always has the same runtime environment -- meaning packages can be installed on the host, without affecting the application. Second: the application can be installed without breaking any other applications or the host.

    Enter container hosts, like Project Atomic, which keep the OS minimal and ship all of the software as containers. Which, in the abstract, makes perfect sense. However, if you want to install debugging tools, monitoring tools, management tools, etc you should also ship them as container images.

    The first thing you usually have to do to get this to work (but not always) is turn off or turn down the security.

    docker run --privileged ...

    The --privileged option turns off almost all of the security used to confine one container from others and from the host.

    You can get more fine grained controls then this by using cap_add and cap_remove to modify the Linux capabilities given to a container (see Capabilities, a short intro and Capability-based security for an overview). You can also modify the SELinux type that a container will run with using the --security-opt label:TYPE_T calls.

    Super Privileged Container (SPC)

    A proposal I have been knocking around for a while now is the idea of a Super Privileged Container (SPC).

    I define an SPC as a container that runs with security turned off (--privileged) and turns off one or more of the namespaces or "volume mounts in" parts of the host OS into the container. This means it is exposed to more of the Host OS. In the most privileged version, the SPC will use ONLY the MOUNT (newns) namespace. It should be able to run without the PID, NET, IPC or UTS namespaces as well as future namespaces.

    I think it would still need to use the MNT namespace in order to bring its own userspace but you could bring parts of the OS or all of "/" into the container using volume mounts.

    The current docker CLI can do almost all of this if you include my --ipc=host patch to disable IPC namespace, which looks like it will get merged soon. The only namespace we can not currently disable is the PID Namespace.

    Lets look at a few use cases for running SPCs

    Examples

    Libvirt in a container

    We want to be able to run Virtual Machines on a Project Atomic system, but we don't want to install all of the code requried to run libvirt and qemu, into the host OS. The libvirt application needs a lot of access to the host. Libvirt needs to be able to store its images on the host system but in certain cases its images are stored as device images. Libvirt also needs to communicate with the host's systemd using dbus to setup cgroups. And, finally, it also needs to be able to use SELinux to setup different labels for sVirt. Brent Baude and Scott Collier wrote a blog on how they were able to get libvirtd to run within a docker container.

    This is the command they used to start their container.

    sudo docker run --rm --privileged --net=host -ti -e 'container=docker' -v /proc/modules:/proc/modules -v /var/lib/libvirt/:/var/lib/libvirt/ -v /sys/fs/cgroup:/sys/fs/cgroup:rw libvirtd

    They needed to use --net=host in order to allow libvirt to manage the network on the host to setup its virtual machines. They also needed to expose the vm's from the host via /var/lib/libvirt. Finally they wanted to allow libvirt to manage the cgroup file system to puts its VMs under cgroup control.

    One thing they missed is that they did not mount /sys/fs/selinux into the container. This would tell libselinux within the container that SELinux was enabled and libvirtd would then be able to launch its containers with sVirt separation.

    In order to get it to work with the hosts /dev directory I would have volume mounted /dev into the container, e.g. -v /dev:/dev. I would have also allowed libvirt to communicate with systemd using dbus by adding -v /run:/run.

    While this may seem like just an exercise in "how many turtles can I stack?," there are potential, real benefits they gain from using docker. For example, a project like libvirtd brings with it lots of user space tools that we don't want to necessarily add to the Atomic host.

    However, the big, unsolved, downside to doing libvirt within a container is, if the admin shuts down the container, it will also kill all of the VMs within the container. If we could eliminate the PID namespace, we could potentially fix this problem.

    A container that needs to load Kernel Modules

    Several packages want to ship custom kernel modules that are not included in the Host OS. Currently, they ship these modules in an RPM package and then load them when the application starts.  There is no reason that you could not do this within a privileged container.  As long as the custom kernel module works with the current kernel.  If your application could run as non-privileged, other then loading the kernel module, it would probably be best to ship the container as two different images, or run the same image with different commands.  for example.

    sudo docker run --rm  --privileged foobar /sbin/modprobe PATHTO/foobar-kmod

    sudo docker run -d foobar

    A host management application like Cockpit

    Cockpit manages a Host OS and needs access to pretty much the entire system. I have been playing around with different ways you might build a SPC for managing the host.

    One idea I experimented with was mounting the host's "/" on to the containers "/". Imagine if you could execute

    sudo docker run -v /:/ rhel7 sh

    Then bind mount your userspace application onto /opt/apps/myapp, the app would have to get its shared libraries and content from subdirs of /opt/apps/myapp.

    Sadly, I believe this will not work, or would be so fragile that it might cause more problems then it is worth.

    It does not seem that gcc/glibc support a mechanism for having their shared libraries in one location while other applications have shared libraries in other directories. /etc/ld.so.cache causes too many problems.

    I believe applications that want to manage the host file system will have to know they are running in a container or at least realize the / is in a sub-directory.

    However, you could run a container like the following to expose the Host to the container.

    sudo docker run -ti --privileged -d --net=host -e sysimage=/host -v /:/host -v /dev:/dev -v /run:/run rhel7-cockpit

    Mounting Volumes

    Cockpit would need to be coded to realize that if $sysimage environment variable is set. It can pre-pend all commands involving the host with $sysimage. Another option would be to standardize on a path. For example: All SPCs put the image in /host or /sysimage.

    Then Cockpit could see if the environment variable container=docker or container_uuid=ID was set and prefix the /sysimage (or /host, not that I am biased to one of the options :) ) onto all of its content.

    The example above mounts the Host's /dev onto the container's /dev which allows Cockpit to manage the Host devices. Processes on the Host would also be able to use these devices.

    I would also mount the Host's /run on the container's /run, which allows processes within the container to communicate with any service that puts a FIFO file or socket into /run. Specifically, /run/dbus/system_bus_socket which would allow the Cockpit instance running inside the container to use dbus to communicate with all of the dbus services; including systemd.

    We might also want to mount /sys on /sys. This would allow processes within the container to manage kernel file systems like SELinux or cgroups.

    Eliminate namespaces

    --net=host eliminates net and uts namespace. This allows processes within the container to see and use the Host's network.

    I have a github pull request patch that is about to be merged which will support --ipc=host. This allows the Cockpit instance to share IPC with the Host system, if that is required. Lots of large projects, e.g. databases, rely on IPC to communicate and run a lot faster if they can use shared memory and semaphores.

    The only thing we don't have yet is --pid=host, which would allow Cockpit to see /proc as /proc. I have been talking about this with the upstream docker project, and the only thing that is difficult to add is the ability to kill all processes within the container. We could do this by freezing the processes within the container (docker pause) and then sending all of them sigkills.

    The nice thing about this is you are still in the docker framework, i.e. docker ps would be able to show your container running.

    CoreOS has a neat shell script hack, toolbox.

    Toolbox uses systemd-nspawn and a docker image. They pack their application (gdb and strace) in a docker image. Toolbox uses docker to load the image then untars the docker image onto disk.

    docker pull "${TOOLBOX_DOCKER_IMAGE}:${TOOLBOX_DOCKER_TAG}"
    docker run --name=${machinename} "${TOOLBOX_DOCKER_IMAGE}:${TOOLBOX_DOCKER_TAG}" /bin/true
    docker export ${machinename} | sudo tar -x -C "${machinepath}" -f -

    and then executes systemd-nspawn to map the mnt namespace and then mount "/" onto /media/root

    sudo systemd-nspawn -D "${machinepath}" --share-system --bind=/:/media/root --bind=/usr:/media/root/usr --user="${TOOLBOX_USER}" "$@"

    The advantage of their method is they don't have separate PID namespaces. Meaning ps -el will show all processes on the system, as mentioned above we need to get this functionality into docker.

    The toolbox solution from CoreOS does NOT get listed in docker ps commands and is not treated the same as other docker images/containers.

    Execute a command in the host namespace

    Say you wanted to execute useradd, but you want to make sure that it happens in the Host namespace so that SELinux labels would be created, the auditing would go to the Host, and most importantly you change the Host's /etc/passwd and /etc/shadow.
    sudo docker run -ti --privileged -d --net=host -e sysimage=/host -v /:/host -v /dev:/dev -v /run:/run /bin/sh
    sudo nsenter --mount=$sysimage/proc/1/ns/mnt -- /sbin/adduser testuser

    Note: This requires that nsenter be inside of your image. Since / is mounted on /host, the /proc of the host is available under $sysimage/proc/1/ns/mnt.

    You could execute many shell commands using this method.

    Conclusion

    Containers can be used to run Host and container management tools.  Having the ability to volume mount into the container and turn off namespaces makes this possible.

    Last updated: February 22, 2024

    Recent Posts

    • How Trilio secures OpenShift virtual machines and containers

    • How to implement observability with Node.js and Llama Stack

    • How to encrypt RHEL images for Azure confidential VMs

    • How to manage RHEL virtual machines with Podman Desktop

    • Speech-to-text with Whisper and Red Hat AI Inference Server

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue