-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The different results between eval mode and test mode. #26
Comments
Hi, Could you share the command you ran for this experiment? |
The command is as follows:
|
Is the highest eval score the same as the test score? |
The ckpt I chosen is the highest eval score during the training steps. |
Can you run the following command on the same machine (which means that the previous checkpoints are still there) and see if the results are different?
|
@eyuansu62 Hi, any new progress over there? Hope we can figure that out together! |
Could you double-check the evaluation and prediction json file? |
I check the evaluation and prediction json file, and find they are indeed different, no matter when do_train=False or num_train_epoch=0. The different sqls are like follows, just a few conditions are wrong: |
Okay, I will keep this issue active and see if anyone find similar problem! |
I just realized that the command you provided is for T5-3b without using deepspeed. I remember that we didn't manage to run without deepspeed even on an A100. What kind of GPU are you using, if you remember? |
Well, it is actually t5-large in this cfg file. I forget to change the file name. |
Hey, we asked someone else for help to test it on his side and didn't get different result between eval mode and test mode(which is consistent with ours). Therefore we think it may because the machine in your side. Could you provide more info about hardware and system then? |
Why I get the different results between eval mode and test mode?
The text was updated successfully, but these errors were encountered: