Build search code tools in binary files

This article shows you how to create a scanning tool that can search for specific sequences of instructions inside binary files. Such searches are commonly required to verify that a compiled executable meets certain criteria, usually related to security. For example, Intel's Control-Flow Enforcement Technology (CET) extension mandates that all functions start with an ENDBR instruction. Verifying this requires a special tool specifically designed to search for instructions inside the binary.

Simple searches with command-line tools

Simple searches do not need special tools. Instead, you can do the job with a combination of command-line tools, possibly inside a script. For example:

objdump -d FOO | grep -i endbr

This searches the FOO program for any variation of the endbr instruction.

Although searches like this are easy to construct, they are often insufficient. For example, the preceding command sequence doesn't verify that every function entry point starts with an ENDBR instruction. It just checks to see whether there are any occurrences of the instruction in FOO.

Building a custom scanner

To search for complex instructions or complex sequences of instructions, you will need a custom scanner. To this end, I'll show you how to modify the stack clash scanner that can be found in a branch of the annobin repository. Retrieve the scanner with:

git clone -b stack-clash-scanner-branch git://sourceware.org/git/annobin.git

This scanner uses the annocheck framework with a custom module. The scanner can check binary files, directories, and even RPMs.

Note: The annocheck code is distributed under the GNU Public License v3 and is free for anyone to use and modify, but Red Hat does not provide any support for it.

How the stack clash scanner works

The basic stack clash scanner in the package looks for an AND instruction that takes the stack pointer and a large constant value as operands, for instance:

and %rsp, 0x1000

The point of this search is to find any stack pointer adjustments that are larger than a page size and that might therefore cause the stack to grow beyond a safe limit.

The scanner works by disassembling the instructions in the binary and then searching the result for lines with the necessary features. The function is_affected_insn() in annocheck/stack-clash.c contains the necessary logic:

/* Only AND instructions are affected. */
if (strncmp (disas.buffer, "and", 3) != 0)
return false;

/* Must involve the stack pointer register. */
if (strstr (disas.buffer, "%rsp") == NULL)
return false;

/* The instruction has to have a "large" immediate value.
For now we take "large" to be 4K or more. */
const char * const_start;
if ((const_start = strstr (disas.buffer, "$0xff")) == NULL)
return false;

/* 4k = 4 * 1024 = 0x1000 */
return strstr (const_start, "000") != NULL;

Modifying the scanner

You can modify the annocheck/stack-clash.c code to build a scanner that searches for other instruction sequences. The repository even includes a couple of other scanners adapted from the core code.

For example, annocheck/jcc-scan.c contains code to locate conditional jump instructions that could be fused with previous instructions. These instructions were the subject of a potential security vulnerability a while back, which is why the scanner was created.

The annocheck/retpoline.c file contains code that searches for a PAUSE instruction followed by a LFENCE instruction, which is an indicator of a compilation that supports the retpoline security feature.

These examples all look for x86_64 instructions, but that is not a hard requirement. The scanners are linked against the opcodes library (part of the GNU Project binutils), which provides a disassembler for the host's architecture. The binutils package is usually installed on systems running Linux. So if the scanner is built on an AArch64 box, for example, it will have an AArch64 disassembler.

It is also possible to link against a custom-built version of the opcodes library that has been configured to disassemble for another architecture. Thus, you could create a scanner that runs on an x86_64 box but examines PowerPC binaries.

Building the custom scanner

The scanner sources include a configure script which, when run, should populate a build directory with the necessary makefiles. Then just running make should build all three scanners I've mentioned. The scanners have the following dependencies:

The libbfd and libopcodes libraries, which are required by the binutils package.
The libelf library, which is provided by the elfutils-libelf package.
The libdw library, which is provided by the elfutils-libs package.
The libiberty library, which is provided by the binutils-devel package.

Advanced scanning

Sometimes the exact instructions you are searching for are unknown. Instead, you have to locate an effect or specific behavior. To handle these cases, you can extend the scanner to simulate the target binary file instead of just disassembling it. This process, of course, is much more complex.

The scanner sources include two examples of this kind of advanced scanning, although neither is built by default. You can use the file annocheck/makefile.rop to build them, although you'll have to edit it to provide some necessary information. These scanners use the headers found in the binutils sources as well as the simulator code that is part of the GNU Debugger (GDB) project.

The advanced scanners in the sources both have the same job: Examining binaries to see whether they are vulnerable to exploits via a return-oriented programming (ROP) attack. One scanner examines AArch64 binaries and the other examines x86_64 binaries. Multiple instruction sequences are vulnerable to this kind of attack, so the scanners simulate the execution of instructions and look for characteristics that are of use to an attacker. Since the attacker can, in theory, start execution at any point in the binary, the scanners have to run lots of simulations, looking for any possible vulnerable entry point.

Conclusion

Looking for instruction sequences in executable files is possible with today's tools. Although command-line tools will suffice for simple scans, a dedicated program is the best solution for complex needs. This article has shown you how to build a custom scanner that you can use for both common and advanced scanning scenarios.

Last updated: October 6, 2022

Build your own tool to search for code sequences in binary files

Simple searches with command-line tools

Building a custom scanner

How the stack clash scanner works

Modifying the scanner

Building the custom scanner

Advanced scanning

Conclusion

How we designed customizable dashboards in OpenShift

Standardize project context with AGENTS.md and Agent Skills

How to use LVM with shared storage

Why is pytorch compile so fast?

The hidden cost of observability sprawl

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links