Before running the code you will need to build the JavaScript engine by running the following commands (docker required):
cd engines/v8
docker build -t v8-build .
docker run -it -v .:/v8 v8-build /v8/build.sh
sudo chown -R $USER:$USER . # to change ownership of the files created by docker back to the user
Additionally, to run the code, you need to first need a working version of python3.11 on your machine and you will then need to install poetry. We use poetry to initialise all our Python dependencies and virtual environment. To setup poetry run the following in the root of our project :
poetry shell
poetry install
As mentioned in the Report, JEFRL uses a three stage training process which includes: pre-training of the AST Transformer, fine-tuning of the AST Transformer, and finally, running the fuzzer. In addition to this we also have a pre-training step to process all the data required for all of these steps.
For our training we made use of DIE-corpus, which includes seed inputs from various different JavaScript engine test suites and also past vulnerability PoCs. First start by cloning the DIE-corpus into the folder corpus
as DIE
:
git clone [email protected]:sslab-gatech/DIE-corpus.git corpus/DIE
After this we can run our pre-processing stage using the following command:
python3 src/preprocessing.py
This will save all the necessary file to the directory data
To run pre-training a machine with a powerful GPU is required. We used an NVIDIA A30 to pre-train and fine-tune our transformer. To run the pre-training use the following command:
python3 src/pretraining.py
This will create new folder in out/finetuning
with the name being the timestamp at which you started training. The name of this folder is required for the next stage of training.
We can run the fine-tuning step by using the following command:
python3 src/finetuning.py --pretraining-path <path-to-pretraing-out-folder> --pretraing-step <step-of-the-pretraining-to-use>
This will create new folder in out/finetuning
with the name being the timestamp at which you started fine-tuning. The name of this folder is required for the next stage.
For this step we no longer require a powerful GPU as our transformer is now frozen. We can run the fuzzer using the following command:
python3 src/main.py --finetuning-path <path-to-finetuning-out-folder> --finetuning-step <step-of-the-finetuning-to-use>
The final model along with run results for every stage are store in out/rl-training
- Implement a way to run a JavaScript program through an engine from Python and receive outputs such as status, coverage, etc (Target: Late January)
- Design and implement a way to take a JavaScript Abstract Syntax Tree (AST) along with a mutation as input and return a mutated AST (Target: Mid-Late February)
- Implement Environment which can take an action from the RL agent, modify the current JavaScript AST based on this action, run the program through the engine, and return the reward along with the new state. (Target: Mid-Late March)
- Implement a DQN agent which takes a JavaScript AST as input and then outputs the desired action to be taken (Target: Early April)
- Make mutations smarter by taking into account the variable/functions in scope (Target: Mid April)
- Add more unit tests for each module due to errors arising during execution (Target: Late April)
- Come up with a comprehensive evaluation metric for the fuzzers to allow for easy comparision, eg. coverage in a certain time period of execution or number of crashes (Target: Early May)
- Implement a transformer model using similar technique to montage to tokenize/embed the AST directly instead of converting to code (Target: Late May)
- Execute fuzzer on several versions of JavaScript engines and use the evaluation metric to tune reward function and hyperparameters (Target: Early June)
- Compare results of RL agent with a random agent and other fuzzers in terms of bugs found in older versions and coverage gained per time period (Target: Early June)
- Write a report (Target: Mid June)