"Fuzzing" an application is a great way to find bugs that may be missed by other testing methods. Fuzzers test programs by generating random string inputs and feeding them into an application. Any program that accepts arbitrary inputs from its users is a good candidate for fuzzing. This includes compilers, interpreters, web applications, JSON or YAML parsers, and many more types of programs.
libFuzzer is a library to assist with the fuzzing of applications and libraries. It is integrated into the Clang C compiler and can be enabled for your application with the addition of a compile flag and by adding a fuzzing target to your code. libFuzzer has been used successfully to find bugs in many programs, and in this article, I will show how you can integrate libFuzzer into your own applications.
To get started with libFuzzer on Red Hat Enterprise Linux (RHEL) 7, you need to install the llvm-toolset-6.0
package, part of the LLVM Toolset software collection. LLVM Toolset includes Clang. To install LLVM Toolset, you must first enable a few additional repositories:
$ sudo subscription-manager repos --enable rhel-7-server-optional-rpms \ --enable rhel-server-rhscl-7-rpms \ --enable rhel-7-server-devtools-rpms
(See How to enable sudo on RHEL if sudo
isn't set up on your system.)
Next, install llvm-toolset-6.0:
$ sudo yum install llvm-toolset-6.0
Since LLVM Toolset is delivered as a Red Hat Software Collection (RHSCL), you need to use scl enable
to launch a new shell with the llvm-toolset-6.0 collection added to your path.
$ scl enable llvm-toolset-6.0 bash
Alternatively, you could permanently add the llvm-toolset-6.0
collection to your profile. For more information, see the article How to install Clang/LLVM 6 and GCC 8 on Red Hat Enterprise Linux 7.
Let's start fuzzing
We'll begin by fuzzing a simple C function that returns the first capital letter in a word:
#include char get_first_cap(const char *in, int size) { const char *first_cap = NULL; if (size == 0) return ' '; for ( ; *in != 0; in++) { if (*in >= 'A' && *in <= 'Z') { first_cap = in; break; } } return *first_cap; } int LLVMFuzzerTestOneInput(const char *Data, long long Size) { get_first_cap(Data, Size); return 0; }
In this C file, we have the function we want to test (get_first_cap
) along with a target function (LLVMFuzzerTestOneInput
) that the fuzzer will call to pass its input to the function.
Now we can compile this function using clang
to create a fuzzable binary:
$ clang -g -fsanitize=fuzzer first-cap.c -o fuzz-first-cap
With the -fsantize=fuzzer
flag, clang
will automatically link our program against the fuzzer library, which includes its own main
function. We now have an executable, fuzz-first-cap
, that we can use to fuzz the get_first_cap
function.
If we run our fuzz-first-cap
program with no arguments, libFuzzer will generate random inputs to test our program. We can also provide a corpus of legal inputs to help libFuzzer to be smarter about the kinds of inputs it generates.
$ mkdir corpus $ echo "Apple" > corpus/Apple.txt $ echo "aPple" > corpus/aPple.txt $ echo "apPle" > corpus/apPle.txt
Now if we run our program with this corpus, we will see that libFuzzer identifies a problem, right away (note we are only using the -seed=1
option to get reproducible output; this is optional):
$ ./fuzz-first-cap -seed=1 corpus INFO: Seed: 1 INFO: Loaded 1 modules (8 inline 8-bit counters): 8 [0x670fa0, 0x670fa8), INFO: Loaded 1 PC tables (8 PCs): 8 [0x45fd48,0x45fdc8), INFO: 3 files found in corpus INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: seed corpus: files: 3 min: 6b max: 6b total: 18b rss: 33Mb #4 INITED cov: 5 ft: 7 corp: 2/12b exec/s: 0 rss: 34Mb #6 REDUCE cov: 5 ft: 7 corp: 2/10b exec/s: 0 rss: 34Mb L: 4/6 MS: 2 ChangeBinInt-EraseBytes- UndefinedBehaviorSanitizer:DEADLYSIGNAL ==15554==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x00000045473a bp 0x7ffc2eacb4d0 sp 0x7ffc2eacb4a0 T15554) ==15554==The signal is caused by a READ memory access. ==15554==Hint: address points to the zero page. #0 0x454739 in get_first_cap /first-cap.c:13:11 #1 0x4547b6 in LLVMFuzzerTestOneInput /first-cap.c:17:3 #2 0x415b99 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (fuzz-first-cap+0x415b99) #3 0x418954 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) (fuzz-first-cap+0x418954) #4 0x41a1f7 in fuzzer::Fuzzer::MutateAndTestOne() (fuzz-first-cap+0x41a1f7) #5 0x41a9af in fuzzer::Fuzzer::Loop(std::vector<std::string, fuzzer::fuzzer_allocator > const&) (fuzz-first-cap+0x41a9af) #6 0x410193 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (fuzz-first-cap+0x410193) #7 0x406562 in main (fuzz-first-cap+0x406562) #8 0x7fcb5a8a53d4 in __libc_start_main (/lib64/libc.so.6+0x223d4) #9 0x4065aa in _start (fuzz-first-cap+0x4065aa) UndefinedBehaviorSanitizer can not provide additional info. ==15554==ABORTING MS: 4 ShuffleBytes-ShuffleBytes-ChangeBit-EraseBytes-; base unit: 7469c22975699536c6c6d00767e773b5429fefc6 0x65,0x0, e\x00 artifact_prefix='./'; Test unit written to ./crash-36282fac116d9fd6b37cc425310e1a8510f08a53 Base64: ZQA=
The most relevant part of the output, is the stack trace, which shows us there was a segmentation fault and then the information about the input that caused the crash, which comes at the end of the output. In this case, we crashed on a 2-byte input with no capital letters: e, \x00
.
There has also been a new file added to our corpus directory:
$ cat corpus/7469c22975699536c6c6d00767e773b5429fefc6 apP
This is a "good" input that libFuzzer generated while fuzzing our program. libFuzzer will add all good inputs it finds to the corpus directory.
So let's fix this bug and then try again:
#include char get_first_cap(const char *in, int size) { const char *first_cap = NULL; if (size == 0) return ' '; for ( ; *in != 0; in++) { if (*in >= 'A' && *in <= 'Z') { first_cap = in; break; } } if (first_cap) return *first_cap; else return ' '; } int LLVMFuzzerTestOneInput(const char *Data, long long Size) { get_first_cap(Data, Size); return 0; }
Then recompile:
$ clang -g -fsanitize=fuzzer first-cap.c -o fuzz-first-cap
And run:
$ ./fuzz-first-cap -seed=1 corpus
This time, libFuzzer did not find any issues after running for around 30 seconds. By default, libFuzzer, will run forever until it finds a bug, but you can configure this using the flag -runs=X
.
So far, our fuzzer has been used to detect segmentation faults, but you can also pair it without one of the clang
sanitizers to check for other kinds of errors. For example, we can take our program and compile it with the address sanitizer enabled:
clang -g -fsanitize=fuzzer,address first-cap.c -o fuzz-first-cap
Now when we run the program, we see a new error:
==15569==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000f1 at pc 0x000000558507 bp 0x7fff78272c30 sp 0x7fff78272c28 READ of size 1 at 0x6020000000f1 thread T0 #0 0x558506 in get_first_cap /first-cap.c:8:11 #1 0x558766 in LLVMFuzzerTestOneInput /first-cap.c:21:3 #2 0x42cea9 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) (fuzz-first-cap+0x42cea9) #3 0x42fc64 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) (fuzz-first-cap+0x42fc64) #4 0x4317df in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::vector<std::string, fuzzer::fuzzer_allocator > const&) (fuzz-first-cap+0x4317df) #5 0x431b72 in fuzzer::Fuzzer::Loop(std::vector<std::string, fuzzer::fuzzer_allocator > const&) (fuzz-first-cap+0x431b72) #6 0x4274a3 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) (fuzz-first-cap+0x4274a3) #7 0x41d852 in main (fuzz-first-cap+0x41d852) #8 0x7f88abaca3d4 in __libc_start_main (/lib64/libc.so.6+0x223d4) #9 0x41d8bb in _start (fuzz-first-cap+0x41d8bb) ...
Here the fuzzer has triggered a heap buffer overflow, which was caught by the address sanitizer. In this case, the input was a string without a null terminator and it caught a bug in our program where we assumed the input would be null terminated.
Besides the address sanitizer, you can also use libFuzzer with LLVM's undefined behavior sanitizer (UBSAN).
There is a lot more you can do with libFuzzer beyond what is shown here in this simple introduction. For more information see the libFuzzer documentation.