This directory contains the implementation of the algorithm to automatically learning search heuristics for dynamic symbolic execution, called ParaDySE (Parametric Dynamic Symbolic Execution). This approach has been implemented on top of CREST at first. (See ParaDySE for our previous work.) And we extended it to be used for KLEE. For technical details, please see our paper below.
- Enhancing Dynamic Symbolic Execution by Automatically Learning Search Heuristics
The script for automatically generating a search heuristic is run on a benchmark built with gcov and llvm. Here, we present the steps to be taken to generate a search heuristic for gcal-4.1
First, each benchmark should be built with gcov and llvm, respectively. For detailed information about the build script we provide, refer benchmarks/README.txt.
# note that installation directory cannot be customized
# since it's hard-coded in paradyse script
cd ~/dd-klee/benchmarks
BASE=./ GCOV_CORE=20 BENCHMARKS=gcal ./build_benchmarks.sh
Next, you can run paradyse with following command.
cd ~/dd-klee/paradyse/
python paradyse.py pgm_config/gcal.json 300 20 72000
Each argument of the last command means:
- pgm_config/gcal.json : a json file to describe the benchmark.
- 300 : the number of parameters to evaluate in Find Phase
- 20 : the number of cpu cores to use in parallel
- 72000: the total testing budget (second)
If the script successfully ends, you can see the following command:
#############################################
Successfully Generate a Search Heuristic!!!!!
#############################################
Then, you can find the generated search heuristic (i.e. parameter values) as follows:
cd ~/dd-klee/experiments/1gcal__all__logs
vim best.w # (Automatically Generated Search Heuristic)
You can run gcal-4.1 with the generated heuristic as follows:
cd ~/dd-klee/experiments/1gcal__all__logs
cp best.w ~/dd-klee/benchmarks/gcal-4.1/obj-llvm/src/best.w
cd ~/dd-klee/paradyse
python run_search_heuristic.py pgm_config/gcal.json best.w 10 10 1000
Each argument of the last command means:
- pgm_config/gcal.json : a json file to describe the benchmark.
- best.w : a search heuristic (e.g., dfs, random-path, nurs:covnew, ... )
- 10: the number of total trials
- 1000: the total testing budget (second)
We implemented ParameterizedSearcher
which inherits the class Seacher
in KLEE. The ParameterizedSearcher
uses various features and pretrained weights to decide which state to execute. KLEE takes command line arguments to set search strategy. You can use our strategy as follow.
klee --search=param --weight=<path_to_weight> [options] <input bytecode> <program arguments>...
A pretrained weight should be provided as a text file, 26 real numbers separated by new line(\n
), since we use 26 features to differentiate each state.
We highly recommend to use ParameterizedSearcher
with BatchingSearcher
with enough batch instructions(e.g. 10000), since it may be very slow to extract features per one instruction for large program.
For example,
klee --search=param --weight=<path_to_weight> --use-batching-search --batch-instructions=10000 ...
Extracting feature is also implemented with OOP paradigm for extensibility. Features are categorized by three types, instruction features, symbolic memory features, and constraint features. For detail, please explore following header files and corresponding source files.
- klee/lib/Core/Feature.h
- klee/lib/Core/InstructionFeature.h
- klee/lib/Core/SymMemoryFeature.h
- klee/lib/Core/ConstraintFeature.h
- klee/lib/Core/FeatureHandler.h
Currently 26 boolean features are used, and they're calculated by 13 C++ classes who mark top 10% and bottom 10% of states based on statistics measured per each state.
We also provide a way to explore statistics related to features. This is currently experimental, and will be evolved for further research.
First, for klee-stats, following command shows table with FETotal
column, meaning the number of feature extractions done.
klee-stats --print-fstats <klee_result_dir>
Second, for klee
, the option --write-feature-map-after
extracts raw statistics of states with csv format. The column is for the statistics and the row is for states. Files are generated per every provided number of feature extractions.
klee --search=param --weight=<path_to_weight> --write-feature-map-after=100 ...
# results are saved in <klee_result_dir>
fmap000001.csv fmap000002.csv fmap000003.csv fmap000004.csv fmap000005.csv fmap000006.csv ...