Fall 2017 GNU Toolchain Update

The GNU Toolchain is a collection of programming tools produced by the GNU Project. The tools are often packaged together due to their common use for developing software applications, operating systems, and low-level software for embedded systems.

This blog is part of a regular series covering the latest changes and improvements in the components that make up this Toolchain. Apart from the announcement of new releases, however, the features described here are at the bleeding edge of software development in the tools. This does mean that it may be a while before they make it into production releases, and they might not be fully functional yet. But anyone who is interested in experimenting with them can build their own copy of the Toolchain and then try them out.

Continue reading “Fall 2017 GNU Toolchain Update”


Adding buffer overflow detection to string functions

This article describes the steps required to add buffer overflow protection to string functions. As a real-world example, we use the strlcpy function, which is implemented in the libbsd library on some GNU/Linux systems.

This kind of buffer overflow protection uses a GNU Compiler Collection (GCC) feature for array size tracking (“source fortification”), accessed through the __builtin_object_size GCC built-in function. In general, these checks are added in a size-checking wrapper function around the original (wrapped) function, which is strlcpy in our example.

Continue reading “Adding buffer overflow detection to string functions”


Practical micro-benchmarking with ‘ltrace’ and ‘sched’

Recently I was asked to look into an issue related to QEMU’s I/O subsystem performance – specifically, I was looking for differences in performance between glibc’s malloc and other malloc implementations’. After a good deal of benchmarking I was unable to see a clear difference between our malloc (glibc) and others, not because the implementations were similar, but because there was too much noise in the system; the benchmarks were being polluted by other things, not just in QEMU, but elsewhere in the operating system. I really just wanted to see how much time malloc was using, but it was a small signal in a large noisy system.

To get to the results I needed, I had to isolate malloc’s activity so I could measure it more accurately. In this blog, I’ll document how I extracted the malloc workload, and how to isolate that workload from other system activity so it can be measured more effectively.

Continue reading “Practical micro-benchmarking with ‘ltrace’ and ‘sched’”


Upgrading the GNU C Library within Red Hat Enterprise Linux

Occasionally, there’s a need for a new GNU C Library for a given application to run.  For example, some versions of the Google Chrome browser started to warn users on Red Hat Enterprise Linux 7 that future versions of Chrome would not support their operating system. The Chromium source code contained a version check, flagging all versions of the GNU C Library (glibc) older than 2.19 as obsolete. This check has since been relaxed to 2.17 (the version in Red Hat Enterprise Linux 7), but it is still worth discussing what we can do to support application binaries in Red Hat Enterprise Linux which require a newer glibc version to run.

Distribution-specific binaries

Before discussing the feasibility of glibc upgrades, it is worth noting that there is a disconnect between how GNU/Linux distributions build the applications they ship as part of the distribution, and how independent software vendors (ISVs) build their application binaries.

Continue reading “Upgrading the GNU C Library within Red Hat Enterprise Linux”


Dirty Tricks: Launching a helper process under memory and latency constraints (pthread_create and vfork)

You need to launch a helper process, and while Linux’s fork is copy-on-write (COW), the page tables still need to be duplicated, and for a large virtual address space that could result in running out of memory and performance degradation. There are a wide array of solutions available to use, but one of them, namely vfork is mostly avoided due to a few difficult issues. First is that vfork pauses the parent thread while the child executes and eventually calls an exec family function, this is a huge latency problem for applications. Secondly is that there are a great many number of considerations to take into account when using vfork in a threaded application, and missing any one of those considerations can lead to serious problems.

It should be possible for posix_spawn to safely do all of this work via POSIX_SPAWN_USEVFORK, but often there is quite a lot of “work” that needs to be done just before the helper calls an exec family function, and that has lead to ever increasingly complex versions of posix_spawn like posix_spawn_file_actions_addclose, posix_spawn_file_actions_adddup2, posix_spawn_file_actions_destroy, posix_spawnattr_destroy, posix_spawnattr_getsigdefault, posix_spawnattr_getflags, posix_spawnattr_getpgroup, posix_spawnattr_getschedparam, posix_spawnattr_getschedpolicy, and posix_spawnattr_getsigmask. It might be simpler if the GNU C Library documented a small subset of functions you can safely call, which is in fact what the preceding functions are modelling. If you happen to select a set of operations that can’t be supported by posix_spawn with vfork then the implementation falls back to fork and you don’t know why. Therefore it is hard to use posix_spawn robustly.

Continue reading “Dirty Tricks: Launching a helper process under memory and latency constraints (pthread_create and vfork)”


Recent improvements to concurrent code in glibc

gnu logoIn this post, I will give examples of recent improvements to concurrent code in glibc, the GNU C library, in the upstream community project. In other words, this is code that can be executed by multiple threads at the same time and has to coordinate accesses to shared data using synchronization. While some of these improvements are user-visible, many of them are not but can serve as examples of how concurrent code in other code bases can be improved.

One of the user-visible improvements is a new implementation of Pthreads semaphores that I contributed. It puts less requirements on when a semaphore can be destructed by a program. Previously, programs had to wait for all calls to sem_wait or sem_post to return before they were allowed to call sem_destroy; now, under certain conditions, a thread that returned from sem_wait can call sem_destroy immediately even though the matching sem_post call has woken this thread but not returned yet. This works if, for example, the semaphore is effectively a reference counter for itself; specifically, the program must still ensure that there are no other concurrent, in-flight sem_wait calls or sem_post calls that are yet to increment the semaphore. The new semaphore implementation is portable code due to being based on C11 atomic operations (see below) and replaces several architecture-specific implementations.

Continue reading “Recent improvements to concurrent code in glibc”


Malloc systemtap probes: an example

gnu logoOne feedback I got from my blog post on Understanding malloc behavior using Systemtap userspace probes was that I should have included an example script to explain how this works. Well, better late than never, so here’s an example script. This script prints some diagnostic information during a program run and also logs some information to print out a summary at the end. I’ll go through the script a few related probes at a time.

global sbrk, waits, arenalist, mmap_threshold = 131072, heaplist

Continue reading “Malloc systemtap probes: an example”