Fuzzing YARA for fun and no profit

10 May 2020

I’ve always been interested in fuzzing YARA to see if anything interesting would be produced. Whilst I didn’t manage to crash YARA when following the methodology that this post outlines whilst targeting the PE module – it’d be great to hear recommendations on how the process I followed upon could be improved on that I’ve made in my YARA fuzzing venture. We’ll be using the excellent american fuzzy lop (a.k.a. “AFL”) as the choice of fuzzer. If we were to find a parsing bug in YARA, it could possibly lead to code execution if a victim (in this case) runs our specially crafted executable through it. Fuzzing is a common method for finding vulnerabilities in software, in particular memory management vulnerabilities. It involves executing the target binary with various input values generated by the fuzzer, to test the program - the goal is to get it to crash.

YARA is a handy tool in the world of malware research, it enables researchers to classify files based on specific parameters such as a sequence of bytes, or format-specific attributes such as function imports in the Import Address Table (“IAT”). The aim of YARA rules is to identify files such as to classify malware samples. It was developed by Victor Alvarez of VirusTotal (“VT”) to identify familes of malware specifically. The YARA tool allows signature-based malware classification similar to AV products.

An example of a simple YARA rule to match a DLL which has the ASCII string "YARA example" and has a sequence of bytes: 0xDE, 0xEA, 0xAD can be observed below.

rule Exemplar
{
    meta:
        description = "A simple example of a YARA rule"
        author = "LloydLabs"

    strings:
        $ = "YARA example"
        $ = {DE EA AD}

    condition:
        all of them
}

When developing YARA rules, I highly recommend installing the YARA extension for Visual Studio Code – which can be found here. We have three sections, a meta section which contains any sort of metadata; a strings section which contains the patterns to match; and; the condition section which is used to define the conditions for the rule.

I wanted to target the PE module within YARA, which provides functionality to parse PE-specific fields. An example of this is accessing the DllCharacteristics flag within the OptionalHeader within the PE structure. At the same time, the rule will also check if the PE is a DLL. The module makes easy work of this - below, we can observe how this would be written within the condition section:

import "pe"

rule Exemplar
{
    condition:
        pe.is_dll() and pe.dll_characteristics & pe.DYNAMIC_BASE
}

In YARA 4.0, multiple new additions were made to the already extremely handy module. I wanted to target the following functions, and make sure we would have complete coverage of them all when the rule was hit:

pe.pdb_path
pe.exports_index(..)
pe.export_details(..)
pe.dll_name
pe.export_timestamp

The rule language which YARA is based on is parsed using GNU Bison, which is an extremely mature parsing generator which has been actively developed since the 1980s (not that this is any excuse). I thought the time would be wasted on targeting this aspect of YARA, and instead the fuzzing efforts would be more successful when targeting the PE parser that they implement themselves. All of the functionality for YARA is contained within libyara, the command-line version of YARA simply uses this library as an easy way to utilise it. Here, we can see the code for the PE module. Here is an example of the code within, which is responsible for parsing the PDB path:

if (yr_le32toh(cv_hdr->dwSignature) == CVINFO_PDB20_CVSIGNATURE)
{
  PCV_INFO_PDB20 pdb20 = (PCV_INFO_PDB20) cv_hdr;

  if (struct_fits_in_pe(pe, pdb20, CV_INFO_PDB20))
    pdb_path = (char*) (pdb20->PdbFileName);
}
else if (yr_le32toh(cv_hdr->dwSignature) == CVINFO_PDB70_CVSIGNATURE)
{
  PCV_INFO_PDB70 pdb70 = (PCV_INFO_PDB70) cv_hdr;

  if (struct_fits_in_pe(pe, pdb70, CV_INFO_PDB70))
    pdb_path = (char*) (pdb70->PdbFileName);
}

If we want to test all of these features, we need to design a YARA rule which hits all of the code paths which result in these new features being tested. Below, we can see the route that we want to take.

Some of the rules accepted different types of arguments (which can be seen in the documentation for the module), e.g. the pe.exports_index supports a string (e.g. pe.exports_index("DllRegisterServer")) and also the ordinal (e.g. pe.exports_index(1337)). We can achieve this by writing a rule to hit all of these conditions by simply using or between all of the different checks. The rule I came up with when fuzzing YARA was:

import "pe"

rule Fuzzawuzza
{
    condition:
        pe.pdb_path == "FUZZ" or pe.dll_name == "FUZZ" or pe.imports(/kernel32.dll/i, /(Read|Write)ProcessMemory/) == 2 or pe.exports_index(/^[email protected]@/) or pe.exports_index(72) or pe.exports_index("CPlApplet") or pe.export_details.name == "FUZZ" or pe.export_timestamp == 1337
}

We’ll then go ahead and save this rule as test_rule.yar for use further down the line. The objective of fuzzing in this instance was to crash YARA, my choice of fuzzer will be AFL by Google.

To do this we’ll feed AFL a legitimate PE binary which will be mutated and changed. First of all, as we have access to the source code of YARA due to it being open source, we need to instrument the binary. afl-gcc is based upon LLVM and a wrapper for GCC, and will inject code into the source code that it is compiling. This way, the fuzzer based on the inputs that it gives the program can find the best and most succesful code paths within the source code. An example of this in the context of YARA could be the initial verification of the file having the MZ header, AFL would work out that an invalid header leads to less code paths and hence less coverage of the program as a whole, this would then be reflected in the mutations that the fuzzer would take in the future. We could also fuzz without the source code, however it makes the fuzzing a lot faster as we can find relevant routes in the code that AFL should target based on it mutating the input file quicker.

First of all, we need a server. With the help of David Cannings, we managed to get a 16-core Google Cloud instance with 64GB of RAM. Fuzzing in the cloud isn’t always the most cost-efficient way to do it, however this was simply for a week. The distribution I’ll be using throughout this is Ubuntu 18.04.4 LTS (love it or hate it 😉). Next, we need to install AFL:

sudo apt install build-essential automake libtool make gcc pkg-config libssl-dev # this was a new box, we need this for make, etc.
wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
tar -xvf afl-latest.tgz
cd afl-latest
sudo make install

Next, let’s go and grab the latest YARA release from GitHub and install it in this case, 4.0.0. We’ll then run afl-gcc against it, which will instrument it ready to be fuzzed by afl-fuzz.

# Pull down YARA
wget https://github.com/VirusTotal/yara/archive/v4.0.0.tar.gz
tar -xvf v4.0.0.tar.gz
cd v4.0.0.tar.gz

# Set our default compiler in the current env to afl-gcc
CC=afl-gcc

# Install YARA
./bootstrap.sh
./configure
sudo make install

# For some reason, libyara isn't found, we need to add it to our LD_PRELOAD path
sudo echo "/usr/local/lib" >> /etc/ld.so.conf
ls -la /etc/ld.so.conf

We now have YARA setup on our machine:

$ yara -v
4.0.0

AFL documents some performance tips here, which I applied to the current instance in order to maximise the efficiency when fuzzing. It doesn’t really matter in terms of anything else, as this instance is simply for fuzzing. Looking at the AFL documentation, the following command line arguments are given as a boilerplate:

./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@ [..params..]

OK, so in our case. We need a PE file in our input (-i) directory, and our /path/to/program needs to be simply yara. The @@ detonates the PE file that AFL will be mutating to fuzz YARA. We’ll just take the classic calc.exe from Windows to base the mutations on.

YARA takes the following arguments when it wants to scan a file:

yara [rule] [file_path]

Our rule, as abovementioned, as already been configured and is saved as test_rule.yar. So, putting this together we get:

mkdir yara_in # Input directory
afl-fuzz -i yara_in -o yara_out yara test_rule.yar @@

Now, AFL has started, and we’ve got this screen:

What, 203.9 executions per second seems a bit slow for a 16 core machine. Let’s go check htop, and see if all of the cores are being used:

OK, not at all. It’s only using one core, which is strange. I thought at first AFL would utilise all of the resources on the system unless told otherwise, but looking at the documentation it says:

Every instance of afl-fuzz takes up roughly one core. This means that on multi-core systems, parallelization is necessary to fully utilize the hardware. For tips on how to fuzz a common target on multiple cores or multiple networked machines, please refer to Tips for parallel fuzzing.

I came across this tool named afl-launch on GitHub here, which allows us to easily launch multiple fuzzers in parallel. Since AFL uses about one core per instance, we’ll want to spin up 16 instances of it. It requires Go, so lets set it up:

sudo apt install golang-go
go get -u github.com/bnagy/afl-launch

Now we’ve set up afl-launch for our user, we need to execute it. Instead of using an output drive when running a single instance of AFL, the directory is called a sync drive, where the subdirectories are that of running AFL instances in parallel.

afl-launch -n 16 -i yara_in -o yara_out yara test_rule.yar @@

Finally we’re using all of that cores that are avaliable at our disposal:

Unlike running a single instance of AFL, which shows us the abovementioned output screen, we can’t do this when fuzzing in parallel. Luckily, afl-whatsup exists. Running this tool and pointing it at our sync directory will show the status of all of the fuzzers. We’ll execute it through watch, which will execute the commands by default every 2 seconds - giving us somewhat of a live update of the status.

watch afl-whatsup yara_out

If you want to pause the fuzzing process across all of your instances, I’d recommended using afl-pause from afl-trivia by Ben Nagy. He’s developed a bunch of awesome scripts which can help you control your AFL instances when they’re running in parallel. To pause the fuzzing process, all you need to do is his pause script: afl-pause <sync_directory>.

During the fuzzing process, as mentioned, AFL will mutate our input file and craft it based on the best route through the program it can find. The process, as detailed in their README.md, goes along the lines of:

Unfortunately, after 1.2 billion executions of YARA, we failed to crash it. So, kudos to the YARA development team and for all of their hard work over the years maintaining such a staple of a tool! I hope this wasn’t too boring and gave you a small introduction to the world of fuzzing, and things you may come across when setting up your fuzzing environment.

Future Work

To demonstrate fuzzing techniques at a later stage, I am going to work on a project named Damn Vulnerable File Parser - a very vulnerable (hence the name), file parser written in C to demonstrate with ease how programs can be fuzzed and lead to them crashing. We could also target older versions of YARA, which are likely to still be in use by organisations and fuzz to find crashes which haven’t already been patched in those versions.

I’m new to using AFL, and fuzzing YARA-like projects in general - if there’s anything that I could’ve changed in my approach in fuzzing YARA please let me know! I’m contactable on Twitter or at [email protected]. I’d be happy to take on any recommendations!

re yara