Several months ago, I took over the maintenance of the flex package in Fedora and decided to kick the tires by rebasing the package in Fedora Rawhide. I downloaded and hashed the latest tarball at the time, flex-2.6.4, tweaked the spec file, and fired up a local build. Unfortunately, it failed with a SIGSEGV
at build time:
./stage1flex -o stage1scan.c ./scan.l make[2]: *** [Makefile:1695: stage1scan.c] Segmentation fault (core dumped)
Some debugging with gdb led me to the conclusion that the segmentation fault was the result of a block of memory returned from the reallocarray
function being written to during flex initialization. In this article, I'll describe the issue further and explain changes made to address it.
Here is a simplified snippet of my gdb session:
(gdb) bt #0 check_mul_overflow_size_t (right=1, left=2048, left@entry=0) #1 __GI___libc_reallocarray (optr=0x0, nmemb=2048, elem_size=1) #2 allocate_array at misc.c:147 #3 flexinit at main.c:974 #4 flex_main at main.c:168 #5 __libc_start_main (gdb) fin Run till exit from #0 check_mul_overflow_size_t __GI___libc_reallocarray 33 return realloc (optr, bytes); (gdb) fin Run till exit from #0 __GI___libc_reallocarray in allocate_array 147 mem = reallocarray(NULL, (size_t) size, element_size); Value returned is $1 = (void *) 0x5555557c6420 (gdb) fin Run till exit from #0 allocate_array in flexinit 974 action_array = allocate_character_array (action_size); Value returned is $2 = (void *) 0x557c6420 (gdb) n 975 defs1_offset = prolog_offset = action_offset = action_index = 0; (gdb) n 976 action_array[0] = '\0'; (gdb) n Program received signal SIGSEGV, Segmentation fault.
I didn't notice anything off here right up to the point at which the segfault occurs, but maybe you already did. All I saw was that the returned pointer was non-NULL on line 974
, but writing to it on line 976
resulted in a segfault. It began to look like a malloc
bug.
On a whim, I built the same tarball outside of the Fedora build system. This time, the typical ./configure && make
command line didn't segfault at build time. So apparently the difference lay in the build options used by rpmbuild. Some trial and error led me to the cause: -pie
, the linker flag that produces a position independent executable. Building with -pie
caused the segmentation fault.
Armed with this "reproducer" and advice from my colleagues at Red Hat, I set about doing a git-bisect on the flex sources. HEAD was building cleanly on the upstream master branch at that point even with -pie
, so it was just a matter of finding the commit that fixed the build. The commit in question was the fix for the following issue reported against flex upstream:
#241: "implicit declaration of function reallocarray is invalid in C99"
So, flex sources didn't declare _GNU_SOURCE
, leading to the compiler's seeing no declaration of the reallocarray function. In such cases, the compiler creates an implicit function declaration with the default return type (int
) and generates code accordingly. On 64-bit Intel machines, the int type is only 32 bits wide while pointers are 64 bits wide. Going back and looking at the gdb session, it then became clear to me that the pointer gets truncated:
147 mem = reallocarray(NULL, (size_t) size, element_size); Value returned is $1 = (void *) 0x5555557c6420 (gdb) fin Run till exit from #0 allocate_array in flexinit 974 action_array = allocate_character_array (action_size); Value returned is $2 = (void *) 0x557c6420
This only happens in position independent executables because the heap gets mapped to a part of the address space where pointers are larger than INT_MAX
, exposing the above flex bug. GCC actually warns of the presence of implicit function declarations via the -Wimplicit-function-declaration
option. It appears that there was a fairly recent proposal to enable this warning in Fedora builds, but it was eventually shelved. If enabled, the warning would still cause the flex build to fail—but earlier and at a point where the problem was clear.
At this point, getting the build to compile successfully was a simple matter of backporting the corresponding flex patch that defines _GNU_SOURCE and exposes the reallocarray prototype to the compiler.
But we didn't just stop there. One of my colleagues, Florian Weimer—a regular contributor to glibc—thought that all this could have been avoided if reallocarray had been exposed by glibc via the more general _DEFAULT_SOURCE
feature test macro. The change has now been committed to glibc upstream and is available since glibc-2.29.
With this change, we hope to avoid similar situations in other components in Fedora and the glibc user community. glibc now provides the reallocarray function prototype unless the user explicitly requires stricter conformance to a given standard.
Last updated: April 17, 2019