Many developers would like to run their existing applications in a container with restricted capabilities to improve security. However, it may not be clear which capabilities the application uses because the code uses libraries or other code developed elsewhere. The developer could run the application in an unrestricted container that allows all syscalls and capabilities to be used to avoid possible hard to diagnose failures caused by the application's use of forbidden capabilities or syscalls. Of course, this eliminates the enhanced security of restricted containers. At Red Hat, we have developed a SystemTap script (container_check.stp) to provide information about the capabilities that an application uses. Read the SystemTap Beginners Guide for information on how to setup SystemTap.
Below is an example of the container_check.stp script monitoring a sudo command and the child processes it creates due to the strace and ping commands. The SystemTap "-c" option will setup the SystemTap instrumentation, run the specified command following the option, and shut down the SystemTap instrumentation once the command is complete. The expected output of the ping and strace commands are printed out followed by the output of the script. If the script warns about skipped probes, the number of active kretprobes allowed needs to be increased by using a larger number in the "-DKRETACTIVE=100" option on the command line.
The container_check.stp script lists out the capabilities used by each executable. The first section of the script output for this example shows ping uses setuid and net_raw capabilities and the sudo uses setgid, setuid, and audit_write capabilities. The next section of the script output provides more details on the specific system calls using those capabilities for each executable. Thus, for this example to run in a container the setuid, setgid, net_raw, and audit_write capabilities would be required.
$ ./container_check.stp -DKRETACTIVE=100 -c "sudo strace -c -f ping -c 1 people.redhat.com" starting container_check.stp. monitoring 20146 PING people02.pubmisc.prod.ext.phx2.redhat.com (10.5.19.28) 56(84) bytes of data. 64 bytes from people02.pubmisc.prod.ext.phx2.redhat.com (10.5.19.28): icmp_seq=1 ttl=57 time=46.3 ms --- people02.pubmisc.prod.ext.phx2.redhat.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 46.370/46.370/46.370/0.000 ms % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 30.90 0.000623 69 9 2 socket 13.69 0.000276 14 20 1 open 7.84 0.000158 7 22 mprotect 7.14 0.000144 5 31 mmap 5.41 0.000109 5 24 close 4.37 0.000088 4 20 fstat 4.07 0.000082 4 20 read 3.08 0.000062 12 5 2 connect 3.03 0.000061 31 2 sendto 2.48 0.000050 8 6 write 2.18 0.000044 44 1 sendmmsg 1.93 0.000039 6 7 setsockopt 1.84 0.000037 7 5 poll 1.84 0.000037 12 3 munmap 1.44 0.000029 6 5 ioctl 1.24 0.000025 4 7 capget 0.99 0.000020 20 1 recvmsg 0.94 0.000019 6 3 recvfrom 0.74 0.000015 5 3 rt_sigaction 0.74 0.000015 5 3 capset 0.55 0.000011 6 2 2 access 0.50 0.000010 10 1 setuid 0.50 0.000010 5 2 prctl 0.45 0.000009 3 3 brk 0.35 0.000007 4 2 getuid 0.30 0.000006 6 1 setitimer 0.30 0.000006 6 1 getsockname 0.30 0.000006 6 1 getsockopt 0.25 0.000005 5 1 rt_sigprocmask 0.25 0.000005 5 1 geteuid 0.20 0.000004 4 1 getpid 0.20 0.000004 4 1 arch_prctl 0.00 0.000000 0 1 execve ------ ----------- ----------- --------- --------- ---------------- 100.00 0.002016 215 7 total capabilities used by executables executable: prob capability ping: cap_setuid ping: cap_net_raw sudo: cap_setgid sudo: cap_setuid sudo: cap_audit_write capabilities used by syscalls executable, syscall ( capability ) : count ping, socket ( cap_net_raw ) : 2 ping, setuid ( cap_setuid ) : 1 sudo, setresuid ( cap_setuid ) : 11 sudo, setresgid ( cap_setgid ) : 10 sudo, setgroups ( cap_setgid ) : 5 sudo, setgid ( cap_setgid ) : 1 sudo, setuid ( cap_setuid ) : 1 sudo, sendto ( cap_audit_write ) : 5 forbidden syscalls executable, syscall: count failed syscalls executable, syscall = errno: count ping, connect = ENOENT: 2 ping, socket = EACCES: 2 ping, access = ENOENT: 2 ping, open = ENOENT: 1 stapio, execve = ENOENT: 5 stapio, rt_sigreturn = EINTR: 1 strace, wait4 = ECHILD: 1 strace, access = ENOENT: 1 sudo, read = EAGAIN: 1 sudo, ioctl = ENOTTY: 2 sudo, recvmsg = EAGAIN: 3 sudo, open = ENOENT: 83 sudo, stat = ENOENT: 7 sudo, access = ENOENT: 4 sudo, fstat = EBADF: 1 sudo, connect = ENOENT: 13 sudo, poll = : 1 sudo, rt_sigreturn = EINTR: 1
You can also monitor already running processes by using the "-x " option and stopping the instrumentation with Ctl-C when the data collection is done. Below is an example monitoring Wireshark, showing the dumpcap executable using the setgid, setuid, and net_raw capabilities:
$ pgrep wireshark 19015 $ ./container_check.stp -DKRETACTIVE=200 -x 19015starting container_check.stp. monitoring 19015 ^C capabilities used by executables executable: prob capability dumpcap: cap_setgid dumpcap: cap_setuid dumpcap: cap_net_raw capabilities used by syscalls executable, syscall ( capability ) : count dumpcap, setresgid ( cap_setgid ) : 1 dumpcap, setresuid ( cap_setuid ) : 1 dumpcap, socket ( cap_net_raw ) : 1 forbidden syscalls executable, syscall: count failed syscalls executable, syscall = errno: count dumpcap, select = : 1 dumpcap, rt_sigreturn = EINTR: 1 dumpcap, setsockopt = EBUSY: 1 dumpcap, stat = ENOENT: 1 dumpcap, access = ENOENT: 2 dumpcap, ioctl = EOPNOTSUPP: 2 dumpcap, recvfrom = EAGAIN: 1 wireshark, recvmsg = EAGAIN: 2840 wireshark, ioctl = EINVAL: 2 wireshark, open = ENOENT: 31 wireshark, stat = ENOENT: 57Last updated: March 20, 2023