This article shows you how to create a scanning tool that can search for specific sequences of instructions inside binary files. Such searches are commonly required to verify that a compiled executable meets certain criteria, usually related to security. For example, Intel's Control-Flow Enforcement Technology (CET) extension mandates that all functions start with an ENDBR
instruction. Verifying this requires a special tool specifically designed to search for instructions inside the binary.
Simple searches with command-line tools
Simple searches do not need special tools. Instead, you can do the job with a combination of command-line tools, possibly inside a script. For example:
objdump -d FOO | grep -i endbr
This searches the FOO
program for any variation of the endbr
instruction.
Although searches like this are easy to construct, they are often insufficient. For example, the preceding command sequence doesn't verify that every function entry point starts with an ENDBR
instruction. It just checks to see whether there are any occurrences of the instruction in FOO
.
Building a custom scanner
To search for complex instructions or complex sequences of instructions, you will need a custom scanner. To this end, I'll show you how to modify the stack clash scanner that can be found in a branch of the annobin repository. Retrieve the scanner with:
git clone -b stack-clash-scanner-branch git://sourceware.org/git/annobin.git
This scanner uses the annocheck framework with a custom module. The scanner can check binary files, directories, and even RPMs.
Note: The annocheck code is distributed under the GNU Public License v3 and is free for anyone to use and modify, but Red Hat does not provide any support for it.
How the stack clash scanner works
The basic stack clash scanner in the package looks for an AND
instruction that takes the stack pointer and a large constant value as operands, for instance:
and %rsp, 0x1000
The point of this search is to find any stack pointer adjustments that are larger than a page size and that might therefore cause the stack to grow beyond a safe limit.
The scanner works by disassembling the instructions in the binary and then searching the result for lines with the necessary features. The function is_affected_insn()
in annocheck/stack-clash.c
contains the necessary logic:
/* Only AND instructions are affected. */
if (strncmp (disas.buffer, "and", 3) != 0)
return false;
/* Must involve the stack pointer register. */
if (strstr (disas.buffer, "%rsp") == NULL)
return false;
/* The instruction has to have a "large" immediate value.
For now we take "large" to be 4K or more. */
const char * const_start;
if ((const_start = strstr (disas.buffer, "$0xff")) == NULL)
return false;
/* 4k = 4 * 1024 = 0x1000 */
return strstr (const_start, "000") != NULL;
Modifying the scanner
You can modify the annocheck/stack-clash.c
code to build a scanner that searches for other instruction sequences. The repository even includes a couple of other scanners adapted from the core code.
For example, annocheck/jcc-scan.c
contains code to locate conditional jump instructions that could be fused with previous instructions. These instructions were the subject of a potential security vulnerability a while back, which is why the scanner was created.
The annocheck/retpoline.c
file contains code that searches for a PAUSE
instruction followed by a LFENCE
instruction, which is an indicator of a compilation that supports the retpoline security feature.
These examples all look for x86_64 instructions, but that is not a hard requirement. The scanners are linked against the opcodes
library (part of the GNU Project binutils), which provides a disassembler for the host's architecture. The binutils
package is usually installed on systems running Linux. So if the scanner is built on an AArch64 box, for example, it will have an AArch64 disassembler.
It is also possible to link against a custom-built version of the opcodes
library that has been configured to disassemble for another architecture. Thus, you could create a scanner that runs on an x86_64 box but examines PowerPC binaries.
Building the custom scanner
The scanner sources include a configure script which, when run, should populate a build directory with the necessary makefiles. Then just running make
should build all three scanners I've mentioned. The scanners have the following dependencies:
- The
libbfd
andlibopcodes
libraries, which are required by the binutils package. - The
libelf
library, which is provided by theelfutils-libelf
package. - The
libdw
library, which is provided by theelfutils-libs
package. - The
libiberty
library, which is provided by thebinutils-devel
package.
Advanced scanning
Sometimes the exact instructions you are searching for are unknown. Instead, you have to locate an effect or specific behavior. To handle these cases, you can extend the scanner to simulate the target binary file instead of just disassembling it. This process, of course, is much more complex.
The scanner sources include two examples of this kind of advanced scanning, although neither is built by default. You can use the file annocheck/makefile.rop
to build them, although you'll have to edit it to provide some necessary information. These scanners use the headers found in the binutils
sources as well as the simulator code that is part of the GNU Debugger (GDB) project.
The advanced scanners in the sources both have the same job: Examining binaries to see whether they are vulnerable to exploits via a return-oriented programming (ROP) attack. One scanner examines AArch64 binaries and the other examines x86_64 binaries. Multiple instruction sequences are vulnerable to this kind of attack, so the scanners simulate the execution of instructions and look for characteristics that are of use to an attacker. Since the attacker can, in theory, start execution at any point in the binary, the scanners have to run lots of simulations, looking for any possible vulnerable entry point.
Conclusion
Looking for instruction sequences in executable files is possible with today's tools. Although command-line tools will suffice for simple scans, a dedicated program is the best solution for complex needs. This article has shown you how to build a custom scanner that you can use for both common and advanced scanning scenarios.
Last updated: October 6, 2022