glibc

While the POSIX standards specified by IEEE form the basis of compatibility between various operating systems and the portability of application code, sometimes unforeseen use cases can exercise an implementation in surprising ways and make us think about whether the interface itself could benefit from a more thorough specification.

As a member of Red Hat's Platform Tools team, I recently had the chance to witness and participate in the glibc developer community's encounter with one such situation. As we worked on triaging and fixing what at first glance seemed to be a regression in the implementation of pthread_atfork(), it soon became apparent that the interface might benefit from a more thorough treatment in its specification than it does already.

pthread_atfork(): What it does and why it does it

pthread_atfork() is used by applications to set up fork handlers—that is, functions that are called before and after processing a call to fork(). It is possible to register multiple sets of handlers, one for each call to pthread_atfork(). Later, when fork() is called, the runtime first executes the prepare handlers in the reverse order of registration, then processes the fork itself. After forking, the runtime executes the parent and child handlers in the corresponding processes, this time in the order of registration.

Here's how the standard defines pthread_atfork():


int

pthread_atfork (void (*prepare) (void),

                void (*parent) (void),

                void (*child) (void));

According to the standard, the rationale behind providing this facility appears to be to tackle shortcomings in the semantics of fork() itself. The standard offers the example of fork() being called in one thread of a multi-threaded process while another thread is performing some operation and at the same time holding a lock that it expects to release once finished. fork() only duplicates the calling thread in the child. Any other threads cease to exist in the child process. Therefore, after fork, the mutex remains locked, with no thread left to unlock it. pthread_atfork() was intended as a solution to this kind of problem.

To quote the rationale from the POSIX standard:

The pthread_atfork() function was intended to provide multi-threaded libraries with a means to protect themselves from innocent application programs that call fork(), and to provide multi-threaded application programs with a standard mechanism for protecting themselves from fork() calls in a library routine or the application itself.

The expected usage was that the prepare handler would acquire all mutex locks and the other two fork handlers would release them.

A glibc bug report

As I mentioned, sometimes interfaces are used in ways that weren't foreseen by the specification (or perhaps the implementation). In May 2019, Jeremy Drake reported a hang in glibc-2.28 during the execution of a pthread_atfork() handler when trying to use OpenVPN with a Gnuk smartcard. It was an excellent bug report, in which Jeremy debugged the issue all the way, eventually identifying its root cause.

One of the software components involved (opensc) had registered a fork handler that dlclose()'d a dynamically loadable module (pcsc-lite) in the child handler at fork time. Meanwhile, the module itself had registered its own set of fork handlers. Now, dlclose()'ing a module means that any fork handlers registered by it should not be executed after the dlclose and should therefore implicitly be deregistered. However, calling dlclose() during the execution of a fork handler means that while one handler is running, another (that has either already been executed or is scheduled to be) needs to be removed from the list and the execution schedule. In other words, the list is modified while it's being walked and executed by the runtime. Depending on how the handler list is implemented/accessed, this can lead to a deadlock. The glibc implementation had been exhibiting the deadlock since release 2.28.

On the one hand, the standard already mentions that calling any non-async-signal-safe function after fork and before an exec family function leads to undefined behavior. This is what happened in this particular case, so technically, it may be argued that this particular deadlock is not a bug. On the other hand, this had been working prior to release 2.28 and, as per the report, at least one application had made use of it.

What had changed?

Upstream glibc releases 2.27 and earlier were immune to this deadlock because of a linked-list based implementation of the fork handler list that used various synchronization primitives: a memory barrier and polling during handler execution, and locks during handler list modification where changes to the list were finalized via atomic operations. In glibc 2.28, a new array-based fork handler implementation was added that, in my opinion, was simpler, easier to reason about, and easier to maintain. In the new implementation, the handler list may only be modified or walked after obtaining a lock. This is what led to the deadlock: fork() took a lock on the fork handler list during handler execution, and one of the handlers called dlclose(), which tried to take the same lock in order to de-register a different fork handler that corresponded to the module being dlclose()'d.

An underspecified interface?

While calling dlclose() in a child handler qualifies as leading to undefined behavior, there are other cases that don't necessarily do so. For example, calling dlclose() in a prepare or parent handler isn't forbidden by the standard. But it would lead to the same kind of deadlock. On top of that, it's also not explicitly forbidden to call pthread_atfork() from a fork handler. However, doing so means registering a new handler during handler execution—and another deadlock. In fact, it appears that at least one of Red Hat's customers ran into this as well. FreeBSD libc, an entirely separate implementation, also runs into deadlocks under these circumstances. Glancing at the code, it appears that this is because FreeBSD libc quite reasonably also obtains a read-lock on the fork handler list during handler execution, and a write-lock during registration/deregistration. Given that two implementations run into the same issue, it appears that there is a case to be made that the standard should treat this class of use cases and clarify what the expected behavior should be when the execution of a handler causes registration or deregistration of another handler.

The fix

The Red Hat Bugzilla report eventually landed on my plate, and with some reading and a lot of advice from seniors on the glibc engineering team here, I began working on a patch. I chose to keep the dynamic array for its clean design, simply releasing the lock just before executing each handler. The idea is that we shouldn't hold implementation locks while executing an external callback. After a few iterations of testing and refining, I posted a patch upstream to the glibc development mailing list. Adhemerval Zanella, a prolific glibc developer, replied to my email almost immediately with a link to a patch he had been working on that I had overlooked. The test case Adhemerval had included in his patch exposed a hole in my own fix that I was then able to plug. I reworked my patch and included his test, and after another round of patch review from Adhemerval, the patch was ready to commit in time for release 2.36. We backported the fix to 2.34 and 2.35 upstream as well as in Red Hat Enterprise Linux releases 8 and 9.

What's next?

Now that the deadlock is gone, there still remain a few open issues. First, there is a race condition where dlclose() may race with handler execution during fork(): just after the runtime chooses the next handler to be executed and releases the lock to begin executing the handler, the handler itself may ve deregistered and unmapped by a dlclose(), leading to a segmentation fault. Next, when it comes to the specification itself, it sounds reasonable to file an issue with the Austin Group asking for clarification regarding calling dlclose() and pthread_atfork() from a prepare or parent handler. Another open task is to better document the glibc implementation of pthread_atfork() and bring it in line with the current implementation. I hope to get around to these as time and priority permit, or perhaps someone else will take them up. The upstream glibc developer community is a helpful and kind one, and we are always happy to welcome new contributors. In this case, the open docs issue is relatively beginner-friendly territory should someone want to get their feet wet.

(Thank you to Adhemerval Zanella, Florian Weimer, Carlos O'Donell, and Siddhesh Poyarekar for review and support.)