Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the tool on other Java Dataset #5

Open
faysalhossain2007 opened this issue Sep 1, 2022 · 3 comments
Open

Running the tool on other Java Dataset #5

faysalhossain2007 opened this issue Sep 1, 2022 · 3 comments

Comments

@faysalhossain2007
Copy link

faysalhossain2007 commented Sep 1, 2022

I am trying to test your model on our dataset. I am able to generate training_bpe.txt and training_tokenize_sard.txt file. But I am facing some issues while generating identifier.tokens and identifier.txt file.

Can you please share the script for generating those files please?

Also, my data contains some patches which is a new if-block with else-block (multi statement). In that case, how should I add the patch in the input ground truth data?

Thanks for your help!

@nashid
Copy link

nashid commented Sep 27, 2022

@faysalhossain2007 were you able to run the model for training from scratch?

@faysalhossain2007
Copy link
Author

@nashid I used their trained mode to test on their dataset.

@jiang719
Copy link
Collaborator

@faysalhossain2007 please check data/data/prepare_testing_data.py, which is the script to prepare test input for new data.

If your ground-truth is inserting a new statement, you could use the line after the insertion as the original buggy line, and expect the model to output patches that is a new statement followed by the buggy line. But CURE's performance on such insertion bugs are not as good since such cases is very rare in our training data. We are building other tools to address this limitations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants