In previous posts, Stack Clash Mitigation in GCC — Background and Stack Clash mitigation in GCC: Why -fstack-check is not the answer, I hopefully showed the basics of how stack clash attacks are structured and why GCC's existing -fstack-check
mechanism is insufficient for protection.
So, what should we do? Clearly we want something similar to -fstack-check
, but without the fundamental problems. Enter a new option: -fstack-clash-protection
.
The key principles for code generation to prevent a stack clash attack are:
- No single allocation can be greater than a page. The compiler must translate large requests into a series of page- or smaller-sized requests.
- As pages are allocated, emit instructions to probe them. (Let's call these explicit probes.)
- A series of sub-page allocations without intervening probes can not allocate more than a page in total.
A naive implementation around these principles could be highly inefficient, but this option provides the basis for building a secure, high-performance implementation.
Implicit probes to improve performance
A stack access that occurs naturally in the code is an implicit probe. An implicit probe implies no additional cost; therefore, using an implicit probe rather than an explicit probe is advantageous. For example, an implicit probe might occur due to the behavior of the target architecture, the requirements of the application binary interface (ABI), or by analysis of existing memory references in the program.
For example, the call instruction on many processors pushes the return address onto the stack. Thus, the call instruction would fault if the stack was in the stack guard. This is an implicit probe at *sp. Some application binary interfaces require that *sp always contains a back-chain pointer (the pointer to the next outer stack frame). Thus, every stack allocation is required to atomically update *sp. Again, this is an implicit probe of *sp.
We can also analyze the generated code. For example, on one target the caller allocates space for the callee to save registers. Thus, in the callee, a register save to *(sp + 48) is an implicit probe at *(sp + 48). On other targets, the callee often pushes pairs of registers onto the stack at function entry. Those pushes are implicit probes at *sp.
It turns out that taking advantage of the implicit probes noted above can dramatically decrease the amount of explicit probing. If we look at glibc as an example on x86 and PPC we see that less than 2% of the functions in glibc require explicit probing in their prologues. For example, if a function allocates less than a page of stack space on these architectures, then no explicit probe is necessary.
Current status
Red Hat's engineers implemented -fstack-clash-protection
for all Red Hat Enterprise Linux (RHEL) targets starting with RHEL 7.5. RHEL 7.5 enables -fstack-clash-protection
for glibc only. Starting with RHEL 8, the entire distribution is compiled with -fstack-clash-protection
and annobin
/annocheck
are used to verify that the distribution was compiled with the proper flags.
Fedora 27 and later enable -fstack-clash-protection
by default for all packages using the standard default compilation options (note that there is no -fstack-clash-protection
support for 32-bit ARM targets).
GCC 8 includes -fstack-clash-protection
support for the Intel, IBM Power, IBM Z series, and ARM's aarch64 targets.
LLVM 11 will include stack clash protections for Intel 64 and AMD64 written by Serge Guelton.
Testing
Red Hat's engineers have written a variety of tests to verify the analysis of static and dynamic stack utilization. Red Hat's engineers have also written regression tests for all bugs reported against -fstack-clash-protection
since its introduction to GCC. These tests run as a part of the standard regression testing process for GCC. Most of the tests are portable enough to be usable on other targets if one was to implement stack clash mitigation on a currently unsupported target.
Red Hat's engineers also implemented a scanner that can examine relocatable objects, executables, and dynamic shared objects. The scanner looks for violations of the key principles listed above within an instruction window and notifies the developer of suspicious code. Red Hat has used the scanner to scan key libraries and objects (with hand verification of all the sequences reported as potentially vulnerable by the scanner). This practice proved particularly useful in verifying that Fedora 27 was consistently using -fstack-clash-protection
and invalidating improvements to the aarch64 implementation made by ARM's engineers.