-
Notifications
You must be signed in to change notification settings - Fork 98
Frontend Tutorial
This page is written in order to provide users the ability to construct fuzzers that can be executed or ensembled on DeepState test harnesses.
DeepState supports writing frontends, which are standalone executors that wrap around fuzzers in order to help provision and execute fuzzing tests. This allows you to save time from manually building up an environment, compiling and instrumenting (if necessary), and any post-test decoding or extraction. Frontends are also integral for DeepState's ensemble mode, which allows us to fuzz a single test with a diverse ensemble of fuzzers for maximum performance.
When built, a frontend may work as so:
$ deepstate-my-fuzzer --compile_test my_harness.cpp --out_test_name my_deep_test
$ deepstate-my-fuzzer --seeds in --output_test_dir out ./my_deep_test
Writing a frontend for your favorite fuzzer is easy, as it relies on you simply extending upon the base frontend.DeepStateFrontend
API available. In fact, the current frontends available for DeepState are all ~200 LOCs.
Let's implement an example frontend for Google's honggfuzz fuzzer. honggfuzz is an incredibly powerful fuzzer that utilizes a feedback-driven strategy on top of initial corpora to maximize code coverage (see more here). Our final frontend can be found here.
To start, let's create and open a file called honggfuzz.py
in the frontend
directory:
$ touch bin/deepstate/frontend/honggfuzz.py
$ vim bin/deepstate/frontend/honggfuzz.py
Let's start by building out the executable in main
:
def main():
# Instantiates our fuzzer object, which inherits from `DeepStateFrontend`.
fuzzer = Honggfuzz()
# Parses user arguments. DeepState provides default support for a set of arguments for
# the CLI, but the developer can extend that by defining their own `parse_args` call.
fuzzer.parse_args()
# Performs any specified sanity checks (pre_exec),
fuzzer.run()
return 0
if __name__ == "__main__":
exit(main())
Yes, it's that simple! Of course, we will need to provide some definitions for the Honggfuzz
subclass that we instantiated, but that is all that we need to provide for our entry point in order to achieve a basic level of functionality.
Let's take a look at the definitions we need to make the executor work in practice:
#!/usr/bin/env python3
from deepstate.frontend import DeepStateFrontend
class Honggfuzz(DeepStateFrontend):
FUZZER = "honggfuzz"
COMPILER = "hfuzz-clang++"
@classmethod
def parse_args(self):
# TODO
def compile(self):
# TODO
def pre_exec(self):
# TODO
@property
def cmd(self):
# TODO
@property
def stats(self):
# TODO
def post_exec(self):
# TODO
At a base level, our DeepStateFrontend
object already provides basic functionality for spawning a fuzzer process and also maintaining seed synchronization, all through run()
. However, it is up to the developer to implement the rest of the fuzzer's functionality.
When initializing a new frontend, you must specify the binary executable name of the fuzzer and compiler that you may need in order to instrument binaries. Once we instantiate a new fuzzer object, the base frontend object will automatically find these executables from $PATH
(or your own custom envvar).
class Honggfuzz(DeepStateFrontend)
FUZZER = "honggfuzz"
COMPILER = "hfuzz-clang++"
We also define a parse_args
class method. DeepStateFrontend
already defines several flags that most fuzzers use (i.e -i
for input seeds directory), but can be readily extended in our frontend subclass.
For Honggfuzz, let's introduce some fuzzer-specific flags:
@classmethod
def parse_args(self):
parser = argparse.ArgumentParser(description="Use Honggfuzz as a backend for DeepState")
# Execution options
parser.add_argument("--dictionary", type=str, help="Optional fuzzer dictionary for honggfuzz.")
parser.add_argument("--iterations", type=int, help="Number of iterations to fuzz for.")
parser.add_argument("--keep_output", action="store_true", help="Output fuzzing feedback during execution.")
parser.add_argument("--clear_env", action="store_true", help="Clear envvars before execution.")
parser.add_argument("--save_all", action="store_true", help="Save all test-cases prepended with timestamps.")
parser.add_argument("--sanitizers", action="store_true", help="Enable sanitizers when fuzzing.")
# Instrumentation options
parser.add_argument("--no_inst", type=str, help="Black-box fuzzing with honggfuzz without compile-time instrumentation.")
parser.add_argument("--persistent", action="store_true", help="Set persistent mode when fuzzing.")
# Hardware-related features for branch counting/coverage, etc.
parser.add_argument("--keep_aslr", action="store_true", help="Don't disable ASLR randomization during execution.")
parser.add_argument("--perf_instr", action="store_true", help="Allow PERF_COUNT_HW_INSTRUCTIONS.")
parser.add_argument("--perf_branch", action="store_true", help="Allow PERF_COUNT_BRANCH_INSTRUCTIONS.")
# Misc. options
parser.add_argument("--post_stats", action="store_true", help="Output post-fuzzing stats.")
cls.parser = parser
return super(Honggfuzz, cls).parse_args()
Once parsed, arguments are stored in self._ARGS
.
We now can implement pre_exec
, which provides functionality for checking our parsed arguments, as well as implementing any environmental checks that are necessary before the fuzzer actually executes.
def pre_exec(self):
# base class performs internal checks
super().pre_exec()
args = self._ARGS
if not args.no_inst:
if not args.input_seeds:
raise FrontendError("No -i/--input_seeds provided.")
if not os.path.exists(args.input_seeds):
os.mkdir(args.input_seeds)
raise FrontendError("Seed path doesn't exist. Creating empty seed directory and exiting.")
if len([name for name in os.listdir(args.input_seeds)]) == 0:
raise FrontendError(f"No seeds present in directory {args.input_seeds}.")
For fuzzers that may rely on OS-level features (i.e perf, coredump patterns), this is also the method where a lot of these sanity checks can be done.
For fuzzers that don't work with black-box binaries, we can implement an interface for supporting test compilation.
def compile(self)
args = self._ARGS
lib_path = "/usr/local/lib/libdeepstate_hfuzz.a"
L.debug(f"Static library path: {lib_path}")
if not os.path.isfile(lib_path):
flags = ["-ldeepstate"]
else:
flags = ["-ldeepstate_hfuzz"]
if args.compiler_args:
flags += [arg for arg in args.compiler_args.split(" ")]
compiler_args = ["-std=c++11", args.compile_test] + flags + \
["-o", args.out_test_name + ".hfuzz"]
super().compile(compiler_args)
We construct our compiler arguments, and pass it to the base DeepStateFrontend
class, which executes a compiler process and generates an instrumented binary.
So with this method, we can now compile tests as so:
$ deepstate-my-fuzzer --compile_test MySimpleTest.cpp
$ deepstate-my-fuzzer --compile_test MyComplexTest.cpp --compiler_args="-lmylib -lsomelib"
After parsing and checking our arguments, we now need to map parsed argument values to the actual fuzzer command line flags. We do this in the cmd
property method,
which produces a dictionary that our runner method can utilize in order to create a viable command to spawn a fuzzer.
@property
def cmd(self):
args = self._ARGS
cmd_dict = {
"--input": args.input_seeds,
"--workspace": args.output_test_dir,
"--timeout": str(args.timeout),
}
if args.dictionary:
cmd_dict["--dict"] = args.dictionary
if args.iterations:
cmd_dict["--iterations"] = str(args.iterations)
if args.persistent:
cmd_dict["--persistent"] = None
if args.no_inst:
cmd_dict["--noinst"] = None
if args.keep_output:
cmd_dict["--keep_output"] = None
if args.sanitizers:
cmd_dict["--sanitizers"] = None
if args.clear_env:
cmd_dict["--clear_env"] = None
if args.save_all:
cmd_dict["--save_all"] = None
if args.keep_aslr:
cmd_dict["--linux_keep_aslr"] = None
# TODO: autodetect hardware features
if args.perf_instr:
cmd_dict["--linux_perf_instr"] = None
if args.perf_branch:
cmd_dict["--linux_perf_branch"] = None
cmd_dict.update({
"--": args.binary,
"--input_test_file": "___FILE___",
"--abort_on_fail": None,
"--no_fork": None
})
if args.which_test:
cmd_dict["--input_which_test"] = args.which_test
return cmd_dict
During each fuzzer run, we always ensure that we pass a specific set of DeepState arguments to each instrumented harness binary: ./bin --input_test_file __FILE__ --abort_on_fail --no_fork
, where __FILE__
represents the symbol the fuzzer recognizes when parsing ARGV in order to perform file-based parsing (this changes depending on the fuzzer being implemented, many use @@
).
With the above methods defined, we now have a fuzzer that can operate with minimal functionality. However, with our API, we can take it a step further and implement other fuzzing workflow-related funtionality, including seed synchronization reporting, parsing fuzzer stats, and post-processing!
-
stats
- property method defining structure for fuzzer-produced runtime statistics
This property method returns fuzzer-related stats in a dict
. Often times, this is done by parsing a generated stats file produced by the fuzzer:
@property
def stats(self):
"""
Retrieves and parses the stats file produced by Honggfuzz
"""
args = self._ARGS
out_dir = os.path.abspath(args.output_test_dir) + "/"
report_f = "HONGGFUZZ.REPORT.TXT"
stat_file = out_dir + report_f
with open(stat_file, "r") as sf:
lines = sf.readlines()
stats = {
"mutationsPerRun": None,
"externalCmd": None,
"fuzzStdin": None,
"timeout": None,
"ignoreAddr": None,
"ASLimit": None,
"RSSLimit": None,
"DATALimit": None,
"wordlistFile": None,
"fuzzTarget": None,
"ORIG_FNAME": None,
"FUZZ_FNAME": None,
"PID": None,
"SIGNAL": None,
"FAULT ADDRESS": None,
"INSTRUCTION": None,
"STACK HASH": None,
}
# strip first 4 and last 5 lines to make a parseable file
lines = lines[4:][:-5]
for l in lines:
for k in stats.keys():
if k in l:
stats[k] = l.split(":")[1].strip()
# add crash metrics
crashes = len([name for name in os.listdir(out_dir) if name != report_f])
stats.update({
"CRASHES": crashes
})
return stats
-
post_exec
- implements any post-processing functionality (ie testcase decoding / de-duplication)
Since honggfuzz already implements a good amount of post-processing functionality, including crash ded-duplication, minimization and decoding, we can demonstrate how that would work with a fuzzer like Eclipser, which requires manual testcase decoding:
def post_exec(self):
"""
Decode and minimize testcases after fuzzing.
"""
out = self._ARGS.output_test_dir
L.info("Performing post-processing decoding on testcases and crashes")
subprocess.call(["dotnet", self.fuzzer, "decode", "-i", out + "/testcase", "-o", out + "/decoded"])
subprocess.call(["dotnet", self.fuzzer, "decode", "-i", out + "/crash", "-o", out + "/decoded"])
for f in glob.glob(out + "/decoded/decoded_files/*"):
shutil.copy(f, out)
shutil.rmtree(out + "/decoded")