Featured Image: systemtap dyninst runtime

As a part of my work at Red Hat, I verify the accuracy of the debugging information that maps between the executable binary generated by compilers and the original source written by the developer. Additionally, I look for complications in the debugging information that stem from compiler optimizations. It is possible to manually inspect the binary and review the debugging information for discrepancies. However, for significant applications, it would take too much time to manually review the megabytes of data. It's also too easy to overlook one drop of erroneous information hidden in the vast sea of correctly generated debugging information.

To automate this type of analysis, you need a tool that analyzes both the debugging information and the binary executable. Dyninst, available in Fedora and Red Hat Enterprise Linux, provides a suite of dynamic and static analysis and instrumentation tools that you can use for this purpose. A previous Red Hat Developer article, Using the SystemTap Dyninst runtime environment, discussed Dyninst's dynamic instrumentation. This article demonstrates how to write a simplified static analyzer in Dyninst.

Analyzing function parameters

In the past, SystemTap "guru mode" has been used to create temporary security fixes known as security-band-aids. A number of these temporary fixes are implemented by changing the value of a function parameter when the function is first entered. For example, in the following cve-2012-0056.stp script, setting the number of bytes written (the $count parameter) to zero disables the mem_write:

probe kernel.function("mem_write@fs/proc/base.c").call {
    $count = 0
}

For this type of fix to work, the parameter being modified needs to be used by later machine instructions in the function. SystemTap gets details about where $count and other parameters are from the debugging information. The location lists generated by the compiler describe the type for the parameter and where the parameter is located (for example, the variable value is in a processor register or memory). However, there is a crucial piece of information that the debugging information might not include: whether the parameter's value is actually read by any instructions. With compiler optimizations such as constant propagation, it is possible that a variable passed to a function may never be used, and a SystemTap guru mode Common Vulnerabilities and Exposures (CVE) fix isn't possible for those cases.

For each processor, there is an application binary interface (ABI) that describes how parameters are passed to functions. The System V Application Binary Interface AMD64 Architecture Processor Supplement describes the ABI for x86_64. To simplify this example, we'll assume the binary code is just simple x86_64 integer code running on Linux where there are up to six parameters passed in registers. This assumption will fail when floating point values or structures are passed. If we know how many parameters are being passed to a function, we can infer which registers are being used. The following list shows which processor register contains each function parameter:

parameter1 %rdi
parameter2 %rsi
parameter3 %rdx
parameter4 %rcx
parameter5 %r8
parameter6 %r9

Liveness analysis

The other essential part of the analysis is computing which register values on entry to a function are used by the binary code. This is known as liveness analysis. Dyninst provides a liveness analyzer as part of the Dataflow API. The liveness analyzer takes a function, determines the possible paths taken through the code, and computes at each instruction in the function the set of registers that hold values that a later instruction could use. If there are no possible uses of the value currently held by the register by later instructions, the value is considered dead. The liveness analyzer produces results for the entire function, and then those results can be queried to determine whether a specific register at a particular instruction is live. For this example, we are just going to analyze the registers holding the parameters on entry to the function.

Building the static analyzer

Now it is time to look at the code snippets used to create the analyzer. First, we need to include the Dyninst header files for the classes being used and to state the namespaces being used to make the code a bit more compact:

#include <dyninst/Symtab.h>
#include <dyninst/Function.h>
#include <dyninst/liveness.h>

using namespace Dyninst;
using namespace SymtabAPI;
using namespace ParseAPI;
using namespace std;

The code will do a simple check to make sure that the program has exactly two arguments: argv[0] (the analyzer program) and arg[1] (the binary being analyzed). The file is opened, then debugging information and binary are read in:

	if (argc != 2) exit(-1);

	// Parse the object file
	Symtab *obj = NULL;
	bool err = Symtab::openFile(obj, argv[1]);

	if( err == false) exit(-1);

	// Create a new binary code object from the filename argument
	SymtabCodeSource *sts = new SymtabCodeSource(argv[1]);
	if(!sts) return -1;

	CodeObject *co = new CodeObject(sts);
	if(!co) return -1;

At this point, we run through each of the functions in the binary with the following for-loop:

	// Iterate through each of the functions.
	for(auto f: co->funcs()) {
		// ... body of analysis goes here
	}

Inside the for-loop, we need to set up and run the liveness analysis on the function:

		// Perform the liveness analysis on function.
		LivenessAnalyzer la(f->obj()->cs()->getAddressWidth());
		la.analyze(f);

The liveness analysis generates data structures that can be queried for information about a specific location in the code. In our analysis, there is just one place location that we are interested in: the entry to the function. This is the first instruction of the first basic block. A basic block is a group of instructions that are executed in sequence, so only the last instruction can change the program control flow with a jump, call, or return:

		// Get the first instruction of the first basic block (function entry).
		Block *bb = *f->blocks().begin();
		Address curAddr = bb->start();
		Instruction curInsn = bb->getInsn(curAddr);

		// Construct a liveness query location for the function entry.
		InsnLoc i(bb,  curAddr, curInsn);
		Location loc(f, i);

Now we need to look up the information about parameters associated with the function. The findFuncByEntryOffset method locates the symbolic debugging information for the function. It is possible that there is no symbol information for a function. If there is symbol information, the getParams method obtains the list of parameters to the function. In this case, the code is just counting the number of parameters:

		// Get the formal parameters and count them.
		SymtabAPI :: Function *func_sym;
		bool found = obj->findFuncByEntryOffset(func_sym, curAddr);
		if (!found) continue; // Missing symbols move on to next function
		vector <localVar *> parms;
		func_sym->getParams(parms);
		int num_parms = parms.size();

One last step is to create a table to convert the parameter number into the Dyninst register name storing the parameter. This is implemented with an array of MachRegister entries. The arg_register array at the beginning of the program holds the register Dyninst names used to pass arguments to functions. This has been written for x86_64, but it could be adapted to run on PowerPC or AARCH64 architectures that Dyninst also supports by changing the register names to match the arguments for those architectures:

const MachRegister arg_register[] = {x86_64::rdi, x86_64::rsi, x86_64::rdx, x86_64::rcx, x86_64::r8, x86_64::r9};

Analyzing the function

We now have all the information we need to analyze the function. We will have the analyzer print out the function's address to make it easier to examine disassembled code followed by the function name. Now we have a loop that queries each of the parameters used by the function. The result is placed in the bool used. If used is false, the parameter (arg number) is printed:

      		// Output the results for the function
		cout  << hex << curAddr << " " << f->name() << ": ";
		for (int i=0; i<min(num_parms, num_reg_args); ++i) {
			// Print up arg number if associated register is unused.
			bool used;
			la.query(loc, LivenessAnalyzer::Before, arg_register[i], used);
			if (!used) 
				cout << "arg" << i+1 << " ";
		}
		cout << endl;

Here are the snippets of codes combined into the program:

// Example DataFlowAPI program; notes which arguments on x86_64 functions are unused
//
// William E. Cohen (wcohen at redhat dot com)
//

#include <dyninst/Symtab.h>
#include <dyninst/Function.h>
#include <dyninst/liveness.h>

using namespace Dyninst;
using namespace SymtabAPI;
using namespace ParseAPI;
using namespace std;

// This could be extended to other architectures by replacing arg_register[] entries with appropriate registers
// Based on Figure 3.4: Register Usage of
// https://web.archive.org/web/20160801075146/http://www.x86-64.org/documentation/abi.pdf
const MachRegister arg_register[] = {x86_64::rdi, x86_64::rsi, x86_64::rdx, x86_64::rcx, x86_64::r8, x86_64::r9};

int main(int argc, char **argv){
	int num_reg_args = sizeof(arg_register)/sizeof(arg_register[0]);

	if (argc != 2) exit(-1);

	// Parse the object file
	Symtab *obj = NULL;
	bool err = Symtab::openFile(obj, argv[1]);

	if( err == false) exit(-1);

	// Create a new binary code object from the filename argument
	SymtabCodeSource *sts = new SymtabCodeSource(argv[1]);
	if(!sts) return -1;

	CodeObject *co = new CodeObject(sts);
	if(!co) return -1;

	// Iterate through each of the functions.
	for(auto f: co->funcs()) {
		// Perform the liveness analysis on function.
		LivenessAnalyzer la(f->obj()->cs()->getAddressWidth());
		la.analyze(f);

		// Get the first instruction of the first basic block (function entry).
		Block *bb = *f->blocks().begin();
		Address curAddr = bb->start();
		Instruction curInsn = bb->getInsn(curAddr);

		// Construct a liveness query location for the function entry.
		InsnLoc i(bb,  curAddr, curInsn);
		Location loc(f, i);

		// Get the formal parameters and count them.
		SymtabAPI :: Function *func_sym;
		bool found = obj->findFuncByEntryOffset(func_sym, curAddr);
		if (!found) continue; // Missing symbols move on to next function
		vector <localVar *> parms;
		func_sym->getParams(parms);
		int num_parms = parms.size();

		// Output the results for the function
		cout  << hex << curAddr << " " << f->name() << ": ";
		for (int i=0; i<min(num_parms, num_reg_args); ++i) {
			// Print up arg number if associated register is unused.
			bool used;
			la.query(loc, LivenessAnalyzer::Before, arg_register[i], used);
			if (!used) 
				cout << "arg" << i+1 << " ";
		}
		cout << endl;
	}
	return (0);
}

To build the analysis, the code needs to be compiled and linked with the various Dyninst libraries with the following command line:

g++ -O2 -g -std=c++17  -L /usr/lib64/dyninst \
-l parseAPI -l symtabAPI -l instructionAPI \
-l tbb -l common  unused_arg.C   -o unused_arg

Analyzing a binary for unused function arguments

We want to test and see that the analyzer works as expected. The following example is a test program that has a main function that calls a couple of other functions: foo and bar. The functions are marked with __attribute__ ((noinline)) to ensure that the compiler doesn't attempt to inline the simple functions as the unused_arg analyzer isn't going to work for inlined functions:

#include <stdio.h>

int __attribute__ ((noinline))
foo(int a, int b)
{
    return 0;
}

int __attribute__ ((noinline))
bar(int a, int b)
{
    return b;
}

int main(int argc, char *argv[])
{
    printf("foo(1,2) = %d\n", foo(1,2));
    printf("bar(1,2) = %d\n", bar(1,2));
    return 0;
}

We build the test program with:

gcc -O2 -g test_unused.c   -o test_unused

Inspecting the preceding code, we would expect that neither parameter 1 or 2 for foo would be used. Function bar actually uses the second argument, so only parameter 1 is unused. Finally, function main doesn't use either of its parameters (arguments). The following is the unused_arg analysis of the test_unused program showing the expected results with additional internal functions listed:

$ ./unused_arg ./test_unused
401000 _init: 
401030 printf: 
401040 main: arg1 arg2 
401090 _start: 
4010c0 _dl_relocate_static_pie: 
4010d0 deregister_tm_clones: 
401100 register_tm_clones: 
401140 __do_global_dtors_aux: 
401170 frame_dummy: 
401180 foo: arg1 arg2 
401190 bar: arg1 
4011a0 __libc_csu_init: 
401210 __libc_csu_fini: 
401218 _fini:

Summary

This article showed how you can use Dyninst to better understand the characteristics of programs. Dyninst provides very powerful tools that analyze both debugging information and binaries. Take a look at Dyninst examples for additional static analysis control-flow graphs (CFGs) and dynamic instrumentation for code coverage examples. For additional information about the tool, check out the upstream Dyninst website.

Comments