Flexible array members (FAM) is an extension of C89 standardized in C99. This article discusses how flexible array members offer convenience and improve performance and how compiler implementations can generate complications.
FAM makes it possible to declare a struct with a dynamic size while keeping a flat memory layout. This is a textbook example:
struct fam {
int size;
double data[ ];
};
The data
array starts empty and will be loaded later, perhaps many times. Presumably, the programmer uses the size
member to hold the current number of elements and updates that variable with each change to the data
size.
Flexible array members vs. pointer implementation
Flexible array members allow faster allocation, better locality, and solid code generation. The feature is an alternative to a more traditional declaration of data
as a pointer:
struct fam {
int size;
double *data;
};
With the pointer implementation, adding an array element requires an extra load initializing the structure on the heap. Each element added to the array requires two allocations for the object and its data member. The process results in fragmented memory between the object and the area pointed at by data
.
Standard flexible array member behavior
The C99 standard, section 6.7.2.1.16, defines flexible array members. A struct with a flexible array member behaves in interesting ways.
It is legal to access any index of fam::data
, providing enough memory has been allocated:
struct fam * f = malloc(sizeof(struct fam) + sizeof(double[n]));
f - > size = n;
The sizeof
operator behaves as if the FAM had zero elements but accounts for the padding required to position it correctly. For instance, sizeof(struct {char c; float d[];}
is unlikely to be equal to sizeof(char)
because of the padding required to correctly position d
.
The assignment operator does not copy the flexible array member, which probably explains why that operator is not part of the C++ standard.
This would be the end of this post if there were no nonconformant compiler extensions.
Nonconforming compiler extensions
Flexible array members are supported only by GCC and Clang in C89 and C++ as extensions. The extensions use alternate syntax, sometimes called a struct hack.
struct fam_extension {
int size;
double data[0];
};
Alternatively, you can specify:
struct fam_extension {
int size;
double data[1];
};
As it turns out, this syntax extended to any array size due to prior art, as suggested in the FreeBSD developers handbook, section 7.5.1.1.2 sockaddr:
struct sockaddr {
unsigned char sa_len; /* total length */
sa_family_t sa_family; /* address family */
char sa_data[14]; /* actually longer; address value */
};
Note that using an array size different from 0 for the FAM makes the allocation idiom more complex because one needs to subtract the size of the FAM:
struct fam * f = malloc(sizeof(struct sockaddr) + sizeof(char[n]) - sizeof(char[14]));
The GCC and Clang extensions normalize the undefined behavior when performing an out-of-bounds access on an array. The program performs regular memory access as if it allocated the memory.
Limitations of sized arrays
The ability to consider sized arrays as FAM impacts the accuracy of some kinds of code analysis. Consider, for instance, the -fsanitize=bounds
option in which the instruments array detects when they are out-of-bounds. Without any context information, it cannot add a check to the following access:
struct fam {
int size;
double data[];
};
int foo(struct fam* f) { return f -> data[8]; }
But if we declare the array as double data[1]
, there is still no instrumentation. The compiler detects a FAM based on the extension definition and performs no check. Even worse, if we declare the array as double data[4]
, trunk GCC performs no check (honoring legacy code, as illustrated in the previous section), while Clang adds a bounds check.
We observe the same behavior for the __builtin_object_size
builtin. This builtin computes the allocated memory reachable from a pointer. When asked for __builtin_object_size(f - > data, 1)
, both GCC and Clang return -1
(indicating a failure to compute that size) for all the declarations of data
we have explored so far. This policy is conservative and removes some of the security offered by _FORTIFY_SOURCE
, which relies heavily on the accuracy of __builtin_object_size
.
Motivation for stricter standard conformance
A codebase that strictly conforms to the C99 standard (at least for FAM) would benefit from a compiler strictly following the standard definition of flexible array members. That goal motivates an effort currently led within the Linux kernel community, as demonstrated by this patch. The documentation update favors C99 FAM in place of zero-length arrays.
To take advantage of this development, they developed a compiler option using GCC and Clang to give the programmer control over flexible array syntax. The option is -fstrict-flex-arrays=
whereas:
- 0 reflects the current situation described earlier.
- 1 considers only
[0]
,[1]
and[ ]
as a FAM. - 2 considers only
[0]
and[ ]
as a FAM.
Compiling code with a -fstrict-flex-arrays
value greater than 0 unlocks some extra security while breaking (some) backward compatibility, which is why n=0 remains the default.
Compiler convergence on C-language flexible array members
Flexible array members is an interesting C99 feature that found its way, through compiler extensions, into C89 and C++. These extensions and legacy codes led to suboptimal code checks in the compiler, which the -fstrict-flex-arrays=
option can now control.