Profile-guided optimization (PGO) is a now-common compiler technique for improving the compilation process. In PGO (sometimes pronounced "pogo"), an administrator uses the first version of the binary to collect a profile, through instrumentation or sampling, then uses that information to guide the compilation process.
Profile-guided optimization can help developers make better decisions, for instance, concerning inlining or block ordering. In some cases, it can also lead to using obsolete profile information to guide compilation. For reasons that I will explain, this feature can benefit large projects. It also puts the burden on the compiler implementation to detect and handle inconsistencies.
This article focuses on how the Clang compiler implements PGO, and specifically, how it instruments binaries. We will look at what happens when Clang instruments source code during the compilation step to collect profile information during execution. Then, I'll introduce a real-world bug that demonstrates the pitfalls of the current approach to PGO.
Note: To learn more about PGO for Clang, see the Clang Compiler User's Manual.
Instrumenting code in Clang
In Clang, the -fprofile-instr-generate
flag instructs the compiler to instrument code at the source instruction level, and the -fprofile-generate
flag instructs the compiler to instrument code at the LLVM intermediate representation (IR) level. Both approaches share a design philosophy, with some differences in granularity. Our topic is -fprofile-instr-generate
, and the way that it interacts with source code changes between profiling and recompilation.
Consider the following scenario:
- Compile a code sample (
C0
) with-fprofile-instr-generate
. - Run it to collect profile information (
P0
). - Edit the
C0
sample and turn it into a new version,C1
. - Compile
C1
using the originalP0
profile information.
How Clang deals with code modification
The scenario of using somewhat obsolete profile information might seem odd because we usually compile, profile, and recompile. The profiling step can be quite time-consuming, however. In some cases, it is tempting for big projects to provide downloadable profile information based on a source snapshot. Administrators can then use the snapshot to recompile the code without the pain of collecting a new profile every time. (The dotnet runtime takes this approach.)
Furthermore, for projects with a high commit rate, it could be unfeasible to provide profile information for each commit. As a result, slight changes to the code might not be documented in the profile used for recompilation. So, how would Clang cope with that?
The trivial answer of "compare checksums for the whole file" is not satisfying because a slight change would invalidate the whole compilation unit. But the actual mechanism relies on the same idea: On a function basis, compute a checksum on the abstract syntax trees (AST), based on the tree structure. That way, changing a function doesn't invalidate the profile information collected for other functions. Of course, this approach has limitations. Removing a call site changes the number of times the function is called, and thus its hotness. But at least it prevents having profile information that points to code that no longer exists, and the other way around.
Currently, if such outdated profile information is used, the Clang compiler ignores it and prints a warning:
> echo 'int main() { return 0; }' > a.c && clang -fprofile-instr-generate a.c && LLVM_PROFILE_FILE=a.profraw ./a.out > llvm-profdata merge -output=a.profdata a.profraw > printf '#include \nint main() { if(1) puts("hello"); return 0; }' > a.c && clang -fprofile-instr-use=a.profdata a.c warning: profile data may be out of date: of 1 function, 1 has mismatched data that will be ignored [-Wprofile-instr-out-of-date] 1 warning generated.
When the improbable happens
Recently, I was tasked with debugging a Clang segmentation fault (segfault), which was raised as an issue in Red Hat Bugzilla Bug 1827282. After debugging, I ended up with two functions having the same checksum:
extern int bar; // first version void foo() { if (bar) { } if (bar) { } if (bar) { if (bar) { } } } // second version void foo() { if (bar) { } if (bar) { } if (bar) { if (bar) { if (bar) { } } } }
That's a strange outcome because the checksum algorithm used in Clang relies on MD5, so the chance of having a conflict should be very low. Did the improbable happen?
It turns out that it didn't. The conflict was due to a slight bug in the way the hashing was finalized, and we fixed it with a patch (D79961). Basically, when computing the hash, a buffer (uint64_t
) needs to be filled. Once it's full, it is converted to an array of bytes and sent to the hashing routine. In the final steps, the uint64_t
was directly sent to the routine and implicitly converted to a uint8_t
, thus potentially ignoring the trailing nodes of the AST. We resolved the issue by adding a new test case that trivially tests that a small function change is reflected in the hash value.
The patch works, but it changes the hash of most existing functions—namely, of each of those that had more than one element in their last buffer. That is an important side-effect because changing the hash invalidates most of the existing cached profiling information. Fortunately, the patch doesn't impact the typical "compile, profile, recompile" scenario, but it could be an issue for large build systems that pre-compute profile data for the client to download as part of the build process.
Conclusion
Clang and GCC both support using obsolete profile information to guide the compilation process. If a function body changes, obsolete information is ignored. This feature can be beneficial for large projects, where gathering profile information is costly. This puts an extra burden on the compiler implementation to detect and handle inconsistencies, which also increases the likelihood of a compiler bug.
Last updated: June 30, 2020