In this prior post we mentioned several new optimization improvements in GCC for Red Hat Enterprise Linux 7. It's time to dig a little deeper. In this post we will focus on partial inlining/function outlining which are part of the Inter-Procedural Analysis (IPA) framework.
Function inlining is a well known technique to improve application performance by expanding the body of a called function into one or more of its call site(s). Function inlining decreases function call overhead, may improve icache behaviour, expose previously hidden redundancies, etc. However, the increase in total code size may be detrimental and, as a result, heuristics which drive inlining are very sensitive to code growth. Function outlining/partial inlining are variants of function inlining to allow for inlining with less code growth.
Function outlining partitions a function into hot (often executed) and cold (rarely executed) code. Cold code is put into separate functions that get called as necessary by the main function. The result is the main function becomes smaller and thus may be inlined more aggressively into call sites.
Partial inlining is closely related to function outlining. Rather than inlining the entire body into a call site, the compiler may choose to inline part of the called function (typically a hot fragment at the start of the function with an early return).
So as a simple example, consider the following code:
extern int something_external (void); int test(int a) { if (a < 100) return 1; something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); return 0; }
There is a clear "early return" in the test() function. Function outlining will gather all the code after the early return test into a distinct function, resulting in something like this:
int test(int a) { if (a < 100) return 1; return test.part.0 (); } int test.part.0 () { something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); something_external (); return 0; }
"test" has now become significantly smaller and may be a candidate for more aggressive inlining. Furthermore, the compiler may choose to only partially inline test() into its call sites by inlining the early exit test at sites where the parameter is a known constant and thus the result of the test is known at compile time.
With the IPA framework, GCC can automatically analyze and perform these optimizations on an entire compilation unit (a single source file) when optimization is enabled. However, modern software design emphasizes modularity, separate compilation, etc. which limit the usefulness of these techniques.
Link-time optimization (LTO) is a further enhancement to allow these kinds of optimizations to occur across entire applications or DSOs (dynamic shared objects, or shared libraries). In essence, LTO defers actual compilation until the link phase when the entire application/DSO is fed to the compiler as a single compilation unit. Thus in an LTO-optimized compile/link, the entire program is subject to all the IPA based optimizations, including inlining, outlining and partial inlining.
Future posts will discuss recursive inlining, function multi-versioning, function specialization and/or other techniques GCC uses to improve the performance of the code it generates.
Last updated: April 5, 2018