Red Hat Enterprise Linux 7 GCC Optimizations - partial inlining indepth

October 29, 2014

Linux

In this prior post we mentioned several new optimization improvements in GCC for Red Hat Enterprise Linux 7. It's time to dig a little deeper. In this post we will focus on partial inlining/function outlining which are part of the Inter-Procedural Analysis (IPA) framework.

Function inlining is a well known technique to improve application performance by expanding the body of a called function into one or more of its call site(s). Function inlining decreases function call overhead, may improve icache behaviour, expose previously hidden redundancies, etc. However, the increase in total code size may be detrimental and, as a result, heuristics which drive inlining are very sensitive to code growth. Function outlining/partial inlining are variants of function inlining to allow for inlining with less code growth.

Function outlining partitions a function into hot (often executed) and cold (rarely executed) code. Cold code is put into separate functions that get called as necessary by the main function. The result is the main function becomes smaller and thus may be inlined more aggressively into call sites.

Partial inlining is closely related to function outlining. Rather than inlining the entire body into a call site, the compiler may choose to inline part of the called function (typically a hot fragment at the start of the function with an early return).

So as a simple example, consider the following code:

extern int something_external (void);

int test(int a)
{
   if (a < 100)  return 1;

   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();

   return 0;
}

There is a clear "early return" in the test() function. Function outlining will gather all the code after the early return test into a distinct function, resulting in something like this:

int
test(int a)
{
   if (a < 100)  return 1;

   return test.part.0 ();
}

int
test.part.0 ()
{
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   something_external ();
   return 0;
}

"test" has now become significantly smaller and may be a candidate for more aggressive inlining. Furthermore, the compiler may choose to only partially inline test() into its call sites by inlining the early exit test at sites where the parameter is a known constant and thus the result of the test is known at compile time.

With the IPA framework, GCC can automatically analyze and perform these optimizations on an entire compilation unit (a single source file) when optimization is enabled. However, modern software design emphasizes modularity, separate compilation, etc. which limit the usefulness of these techniques.

Link-time optimization (LTO) is a further enhancement to allow these kinds of optimizations to occur across entire applications or DSOs (dynamic shared objects, or shared libraries). In essence, LTO defers actual compilation until the link phase when the entire application/DSO is fed to the compiler as a single compilation unit. Thus in an LTO-optimized compile/link, the entire program is subject to all the IPA based optimizations, including inlining, outlining and partial inlining.

Future posts will discuss recursive inlining, function multi-versioning, function specialization and/or other techniques GCC uses to improve the performance of the code it generates.

Last updated: April 5, 2018

Report a website issue

Linux

Java runtimes & frameworks

Kubernetes

Integration & App Connectivity

Automation

Developer tools

Developer Sandbox for Red Hat OpenShift

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Secure Development & Architectures

Platform Engineering

Automated Data Processing

Start exploring in the Developer Sandbox for free

Interactive Lessons and Learning Paths

Developer Sandbox Activities

E-Books

Tutorials

Cheat Sheets

API Catalog

Red Hat Learning

Tech Talks

Deep Dives

Red Hat Summit

Red Hat Enterprise Linux 7 GCC Optimizations - partial inlining indepth

Red Hat Trusted Software Supply Chain is now available

Synchronize instance tags from Amazon EC2 and Microsoft Azure with Red Hat Insights

Containerize Node.js applications at the edge on RHEL and Fedora

How to monitor OpenShift using the Datadog Operator

Red Hat build of Keycloak high availability: A simplified approach

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue