This is the repository of Automated Classification of Overfitting Patches with Statically Extracted Code Features (doi:10.1109/tse.2021.3071750)
title = {Automated Classification of Overfitting Patches with Statically Extracted Code Features},
author = {He Ye and Jian Gu and Matias Martinez and Thomas Durieux and Martin Monperrus},
journal = {IEEE Transactions on Software Engineering},
year = {2021},
doi = {10.1109/tse.2021.3071750},
├── Experiment: csv feature data and script for reproducing our experiment
├── Features: ODS code features
│ └── Code: ODS code description features in JSON format
│ └── Patterns: ODS repair pattern features in JSON format
│ └── Context: ODS context features in JSON format
├── Source: The source program files that can be taken input for Coming to generate ODS features
├── Tests: Evosuite tests generated for Bugs.jar and Bears for labeling the correctness of RepairThemAll patches
└── RawRepairThemAllPatches: raw patches from the experiment of RepairThemAll
We have integrated ODS feature extraction with an open source tool Coming. To extract code features, you can parse a pair of source and target files in Source folder. Use the feature mode of Coming to obtain ODS features.
We use the default parameters of XGBoost (i.e., learning_rate sets to 0.3 and max_depth sets to 6), only turning the gamma to 0.5. All parameters can be found in our notebooks.
mvn install -DskipTests
execute the following script with the demo samples in Coming project. You will get a generated csv file called test.csv and the code features in Json format in output path.
java -classpath ./target/coming-0-SNAPSHOT-jar-with-dependencies.jar fr.inria.coming.main.ComingMain -input files -mode features -location ./src/main/resources/pairsD4j -output ./out
Please be noted that Coming project requires the specific structures of input source and target files:
├── <diff_folder>
│ └── <modif_file>
│ ├── <diff_folder>_<modif_file>
│ └── <diff_folder>_<modif_file>
get the test.csv ready and predict it with the following code. You will find the prediction result generated in prediction.csv.
You may also need the dependecies:
python3 -m pip install xgboost
python3 -m pip install scikit-learn
python3 -m pip install imblearn
python3 -m pip install matplotlib