Dirty Tricks: Launching a helper process under memory and latency constraints (pthread_create and vfork)

You need to launch a helper process, and while Linux’s fork is copy-on-write (COW), the page tables still need to be duplicated, and for a large virtual address space that could result in running out of memory and performance degradation. There are a wide array of solutions available to use, but one of them, namely vfork is mostly avoided due to a few difficult issues. First is that vfork pauses the parent thread while the child executes and eventually calls an exec family function, this is a huge latency problem for applications. Secondly is that there are a great many number of considerations to take into account when using vfork in a threaded application, and missing any one of those considerations can lead to serious problems.
It should be possible for posix_spawn to safely do all of this work via POSIX_SPAWN_USEVFORK, but often there is quite a lot of “work” that needs to be done just before the helper calls an exec family function, and that has lead to ever increasingly complex versions of posix_spawn like posix_spawn_file_actions_addclose, posix_spawn_file_actions_adddup2, posix_spawn_file_actions_destroy, posix_spawnattr_destroy, posix_spawnattr_getsigdefault, posix_spawnattr_getflags, posix_spawnattr_getpgroup, posix_spawnattr_getschedparam, posix_spawnattr_getschedpolicy, and posix_spawnattr_getsigmask. It might be simpler if the GNU C Library documented a small subset of functions you can safely call, which is in fact what the preceding functions are modelling. If you happen to select a set of operations that can’t be supported by posix_spawn with vfork then the implementation falls back to fork and you don’t know why. Therefore it is hard to use posix_spawn robustly.
How do you overcome the limits of vfork without having to resort to the complexity and security considerations of IPC between a helper daemon that starts processes for you? Use pthread_create and vfork together to give you the semantics you want. Use of pthread_create and vfork gives you all the benefit of vfork, the shared page tables, the fast execution, coupled with the pauseless execution you need. Only the additional thread is paused when you vfork, the rest of the threads in the process continue executing.
This is certainly a dirty trick, but as far as the author is concerned the example code takes into account or warns the reader of all possible considerations to doing this safely. Likewise all actions that do not impact parent state are valid to execute after the vfork and before the exec family function call, and I argue that all C libraries should document such functions.
Cheers,
Carlos.
/* Copyright (c) 2014 Red Hat Inc. Written by Carlos O'Donell <codonell@redhat.com> Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ /* Example: How to use vfork safely from a multi-threaded application. This example is intended to show the safe usage of vfork by a multi-threaded application. The example does not use any advanced features like clone without CLONE_VFORK to avoid parent suspension. The example can also be rewritten slightly to be used in a non-multithreaded environment and it still remains safe since the latter is just a degenerate case of the former with one main thread. The example is only valid on Linux with the GNU C Library as the core runtime. Other runtimes may require other actions to call vfork safely from a multi-threaded application. The inline comments in the code will explain each of the steps taken and why. Justification for some steps is rather complicated so please read it twice before asking questions. Any questions should go to libc-help@sourceware.org where the GNU C Library community can assist with interpretations of this code. */ #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #include <signal.h> #include <pthread.h> #include <errno.h> #include <string.h> /* The helper thread executes this application. */ const char *filename = "/bin/ls"; char *const new_argv[2] = { "/bin/ls", NULL }; char *const new_envp[1] = { NULL }; int status; void * run_thread (void *arg) { int i, ret; pid_t child, waited; struct sigaction newsa, oldsa; /* Block all signals in the parent before calling vfork. This is for the safety of the child which inherits signal dispositions and handlers. The child, running in the parent's stack, may be delivered a signal. For example on Linux a killpg call delivering a signal to a process group may deliver the signal to the vfork-ing child and you want to avoid this. The easy way to do this is via: sigemptyset, sigaction, and then undo this when you return to the parent. To be completely correct the child should set all non-SIG_IGN signals to SIG_DFL and then restore the original signal mask, thus allowing the vforking child to receive signals that were actually intended for it, but without executing any handlers the parent had setup that could corrupt state. When using glibc and Linux these functions i.e. sigemtpyset, sigaction, etc. are safe to use after vfork. */ sigset_t signal_mask, old_signal_mask, empty_mask; sigfillset (&signal_mask); /* One might think we need to block SIGCANCEL (cancellation handling signal) and SIGSETXID (set*id handling signal). These signals are a hidden part of the implementation, and if delivered to the child would corrupt the parent state. The SIGSETXID signal is only sent to threads that the implementation knows about and the child of vfork is not known as a thread and thus safe from having a set*id handler run. This is a distinct issue from the one below regarding calling set*id functions. The SIGCANCEL signal is only sent in response to a pthread_cancel call, and since the child has no pthread_t it will not receive that signal by any ordinary means. Thus it would be undefined for anything to send SIGSETXID or SIGCANCEL to the child thread. If you suspect something like this is happening you might try adding this code: #define SIGCANCEL __SIGRTMIN #define SIGSETXID (__SIGRTMIN + 1) sigaddset (&signal_mask, SIGCANCEL); sigaddset (&signal_mask, SIGSETXID); This will prevent cancellation and set*id signals from being acted upon. Please report this problem to libc-alpha@sourceware.org if you encounter it since the child running either handler for those signals is an implementation defect. */ pthread_sigmask (SIG_BLOCK, &signal_mask, &old_signal_mask); /* WARNING: Do not call setuid(2) any other set*id(2) functions from other threads while vfork-ing. This could allow privilege escalation attacks. It is often assumed that vfork(2) stops the entire process but on many OS's it just suspends the thread which called vfork(2). Calling a setuid(2) function from another thread while vfork-ing could result in two threads with different UIDs or GIDs sharing the same memory space. As a concrete example a thread might be running as root, vfork a helper, and then proceed to setuid to an unprivileged user to run some untrusted code. In this case the root privilege thread shares the same address space as the unprivileged threads. One of the unprivileged threads could then remap parts of the address space to get root privileged thread, which has not yet exec'd, to execute arbitrary code. Therefore you need to be careful about calling set*id() functions while vfork-ing. You avoid this problem by coordinating your credential transitions to happen after you know your vfork() is complete i.e. the parent is resumed telling you the child has completed exec-ing. If you can't coordinate the use of set*id() functions, then the only option left is to use the posix_spawn* interfaces which serialize set*id() transitions in glibc (Sourceware BZ #14750 and BZ #14749 must be fixed in your version of glibc for this to work properly). */ child = vfork (); if (child == 0) { /* In the child. */ /* We reset all signal dispositions that aren't SIG_IGN to SIG_DFL. This is done because the child may have a legitimate need to receive a signal and the default actions should be taken for those signals. Those default actions will not corrupt state in the parent. */ newsa.sa_handler = SIG_DFL; if (sigemptyset (&empty_mask) != 0) _exit (1); newsa.sa_mask = empty_mask; newsa.sa_flags = 0; newsa.sa_restorer = 0; for (i = 0; i < NSIG; i++) { ret = sigaction (i, NULL, &oldsa); /* If the signal doesn't exist it returns an error and we skip it. */ if (ret == 0 && oldsa.sa_handler != SIG_IGN && oldsa.sa_handler != SIG_DFL) { ret = sigaction (i, &newsa, NULL); /* POSIX says: It is unspecified whether an attempt to set the action for a signal that cannot be caught or ignored to SIG_DFL is ignored or causes an error to be returned with errno set to [EINVAL]. Ignore errors if it's EINVAL since those are likely signals we can't change. */ if (ret != 0 && errno != EINVAL) _exit (2); } } /* Restore the old signal mask that we inherited from the parent. */ pthread_sigmask (SIG_SETMASK, &old_signal_mask, NULL); /* At this point you carry out anything else you need to do before exec like changing directory etc. Signals are enabled in the child and will do their default actions, and the parent's handlers do not run. The caller has ensured not to call set*id functions. The only remaining general restriction is not to corrupt the parent's state by calling complex functions. The safe functions should be documented by glibc but aren't, please reach out to libc-alpha@sourceware.org to discuss. */ /* ... */ /* The last thing we do is execute the helper. */ ret = execve (filename, new_argv, new_envp); /* Always call _exit in the event of a failure with exec functions. */ _exit (3); } if (child == -1) { /* Restore the signal masks in the parent as quickly as possible to reduce signal handling latency. */ pthread_sigmask (SIG_SETMASK, &old_signal_mask, NULL); perror ("vfork"); exit (EXIT_FAILURE); } else { /* In the parent. At this point the child has either succeeded at the exec or _exit function call. The parent, this thread, which would have been suspended is resumed. */ /* Restore the signal masks in the parent as quickly as possible to reduce signal handling latency. */ pthread_sigmask (SIG_SETMASK, &old_signal_mask, NULL); /* Wait for the child to exit and then pass back the exit code. */ waited = waitpid (child, &status, 0); if (waited == (pid_t) -1) { perror ("wait"); exit (EXIT_FAILURE); } if (WIFEXITED(status)) { printf("Helper: Exited, status=%dn", WEXITSTATUS(status)); } else if (WIFSIGNALED(status)) { printf("Helper: Killed by signal %dn", WTERMSIG(status)); } return NULL; } } int main (void) { int ret; pthread_t thread; /* The application creates a thread from which to run other processes. The thread will immediately attempt to execute the helper process. On Linux the vfork system call suspends only the calling thread, not the entire process. Therefore it is still useful to use vfork over fork for performance, particularly as the process gets larger and larger the cost of fork gets more expensive as page table (not memory, since it's all copy-on-write) size grows. */ ret = pthread_create (&thread, NULL, run_thread, NULL); if (ret != 0) { fprintf (stderr, "pthread_create: %sn", strerror (ret)); exit (EXIT_FAILURE); } /* Do some other work while the helper launches the application, waits for it, and sets the global status. */ /* ... */ /* Lastly, wait for the helper thread to terminate. */ ret = pthread_join (thread, NULL); if (ret != 0) { fprintf (stderr, "pthread_join: %sn", strerror (ret)); exit (EXIT_FAILURE); } exit (EXIT_SUCCESS); }
Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.
Take advantage of your Red Hat Developers membership and download RHEL today at no cost.