C/C++ library upgrades and opaque data types in process shared memory
C/C++ libraries expect to be able to change the internal implementation details of opaque data types from release to release since such a change has no external ABI consequences. If an opaque data type is placed in process-shared memory (when allowed by the standard) and shared with multiple processes, each process must ensure they are using exactly the same version of the library or they could fail in unexpected ways during library upgrades. The placement of opaque data types in process-shared memory is never allowed unless otherwise stated by the library documentation. For the GNU C Library (glibc) you may place pthread_mutex_t, pthread_cond_t, and sem_t in process-shared memory as allowed by POSIX. Failures using these types occur because a process started more recently may have a newer version of the library for the type and that version may have a different understanding of the internal details of the type. The problem has always been one for the developer to solve, but without help, this problem is so intractable as to make it difficult to robustly use opaque data types in process shared memory.
We will cover opaque data types, what they are, why you would use them, and how library upgrades play into the problem, and what might be done by the application developer.
Opaque data types in process shared memory
What is an opaque type and when would you use it? An example of an opaque type in the GNU C Library (glibc) is pthread_mutex_t and you would use it for mutual exclusion between threads.
pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER; pthread_mutex_lock (&mtx); ... pthread_mutex_unlock (&mtx); pthread_mutex_destroy (&mtx); ...
An opaque type is any type whose internal details are not made visible to the application using the type. The external details like size and alignment (external ABI) are always visible to the application, but with an opaque type, the internal details are not. The application knows nothing about the internal details of pthread_mutex_t. Opaque data types are not the same as opaque pointers. The type of the opaque data is known and complete, while the type of the opaque pointer is incomplete. A good example of the difference is the FILE* returned by fopen(). The FILE* type is not required to be complete and therefore conforming and portable C programs must treat it as if it were incomplete i.e. an opaque pointer (even if in glibc the type is complete and known for historical reasons). The opaque pointer design pattern is also known as Pimpl (pointer-to-implementation).
Why would you use an opaque type? To hide the details of the implementation from the application, and to gain the freedom to change those details in the future for the purposes of improved performance, reduced memory usage, or any number of possible implementation-dependent reasons.
What cost is there in using an opaque type? The opaque data type is an abstraction that allows future developers the ability to change the implementation at the cost of fewer optimizations within the application itself. For example, if the application could inline operations on the type then it might see a performance improvement, but that would make the internals visible to the application and it would no longer be opaque.
Why would you place an opaque type in process-shared memory? For the purposes of very fast inter-process communication (IPC). Not all data types are allowed to be in process-shared memory, please consult your library documentation for details. The GNU C Library allows several important POSIX thread structures to be created with a special flag set that allows these structures to be shared by multiple processes. The intent is that a shared memory segment can be used to contain data and synchronization primitives. For example, you could create a shared ring buffer in memory to allow for efficient exchange between threads and processes with concurrency controlled by a pthread_mutex_t that exists in the same shared memory.
int shmfd; void *data; pthread_mutex_t *mtx; pthread_mutexattr_t mtx_attr; pthread_mutexattr_init (&mtx_atr); /* Require the mutex to be process shared. */ pthread_mutex_attr_setpshared (&mtx_attr, 1); shmfd = shm_open ("/posixsharedmemory", O_RDWR | O_CREAT, S_IRUSR | S_IWUSR); ftruncate (shmfd, 4096); data = mmap (NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, shmfd, 0); /* Place the mutex in shared memory. */ mtx = (pthread_mutex_t *) data; pthread_mutex_init (mtx, &mtx_attr); pthread_mutex_lock (mtx); ... pthread_mutex_unlock (mtx); ...
The C/C++ library upgrade problem
What does this have to do with C/C++ library upgrades? During a package upgrade, you could have two or more processes accessing a process shared memory segment with two or more different versions of a library whose accessors may disagree about the internal ABI and semantics of the opaque data type!
Is this a real problem? It is. https://bugzilla.redhat.com/show_bug.cgi?id=1394862
How do you protect against this problem? The traditional response from library authors has always been that all processes accessing the same shared memory segment must all have exactly the same version of all accessors’ libraries for all opaque types in the segment. This is not a trivial thing to ensure. Start with the naive solution, which is to integrate your application directly into the package manager to query the version of all the components currently installed on the system. This is a race prone operation since once the dynamic loader has loaded a shared object from disk the package manager may unlink() it and install a new version, and another version, and another. There is usually no direct link between what is loaded in the process memory and what the package manager views as the currently installed package (we’re talking about traditional Unix package managers). Therefore, you cannot query the system to determine what has been loaded by your application. In that case, you have little recourse but to do one of two things:
- Never place opaque data types in process shared memory and use more primitive concurrency APIs e.g. Use C11/C++11 atomics to synchronize access to non-opaque data types instead of pthread_mutex_t.
- Allow only forked children to access the process shared memory.
The last bullet requires a bit of explanation because the trick is that all forked children always share the same set of loaded libraries; with the caveat that you must restrict all calls to dlopen() to the process startup before forking.
What is the best practice for solving this kind of problem? Avoiding the problem is the best practice, either through the suggestions above or by not supporting live package upgrades, e.g. entire microservices could be brought down upgraded atomically and brought back up again. The next best practice is versioned data structures. The problem with this is that version data for each structure increases data-cache pressure, while version comparisons in each accessor increase instruction cache pressure, and both reduce performance. Lastly, library authors could switch to a pointer-to-implementation design and require the use of accessors to read and write opaque pointers to process shared memory, hiding the version checks from the user, and limiting them only to process shared memory, but ultimately still needing to return a “wrong version” failure for the application to handle gracefully. None of these solutions are immediately applicable to problems seen in the field. For libraries with existing structures, it would be an ABI break to add version fields i.e. increases the length of the structure.
Why doesn’t glibc just fix this? It could be said that this is a glibc problem, or a library problem, that the application request for process-shared memory support via pthread_mutex_attr_setpshared (&mtx_attr, 1); should be sufficient to request the use of a backward/forward compatible type that never changes across library versions. The difficulty there is that to honor this the library would have to keep forwards and backwards compatibility for the type regardless of the performance consequences. This is contrary to the design goal of using process shared memory for very fast IPC. If by using process shared memory you have to settle for slow and naive forwards/backwards compatible pthread_mutex_t, then what is the point of using process shared memory? No, the library must be able to change the opaque type at will to improve performance and other characteristics of the type. We must find another way to ensure the opaque types are compatible.
Is there a backwards-compatible solution? One solution that might retrofit nicely into existing applications is to have library authors provide an interface to identify the state of the internal ABI of all the opaque types.
/* Returns a hash that identifies the internal ABI of all the library structures. */ abi_hash_t gnu_get_libc_internal_abi (void);
Application authors could publish this number in the process shared memory segment and attach processes could compare it, and if different, use a slower inter-process synchronization method e.g. file locking, to negotiate process restarts until all processes have the same agreed upon version of the required libraries. This kind of single value versioning for the entire library is analogous to the package manager version but is accessible from within the process (avoids the version check race) and only changes when one of the internal ABIs changes. If needed finer grained internal ABI hashes could be provided.
/* Returns a hash that identifies the internal ABI of type TYPE from the library. */ abi_hash_t gnu_get_libc_type_internal_abi (abi_type_t type);
Because the hash is equivalent to a kind of package version it is not immediately portable across distributions e.g. containers sharing a memory or distributed shared memory systems made up of distinct distributions. Cross-distribution process shared memory would require coordinating exactly what each hash means (and coordinating patch backports for changes that change internal ABI). Upstream projects would clearly have a hash value for an official release, which would ease coordination for accessing opaque types in process-shared memory across distribution boundaries.
It is possible to place opaque data types in process-shared memory, but you must be aware of C/C++ library upgrade issues and design your application accordingly. Future C/C++ libraries should provide identifiers to represent internal ABIs to allow application authors the ability to detect ABI incompatibilities for opaque data types and react accordingly. Serialization of opaque types, if allowed (not allowed by POSIX), is an analogous problem, and may become more prevalent if non-volatile memory becomes popular.
Special thanks to Florian Weimer, DJ Delorie, Martin Sebor, Grant Grundler, Helge Deller, Mathieu Desnoyers, Roger Bins, and Richard Hipp for useful real-world feedback on the problem.