-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query about GSM8K evaluation #73
Comments
Hi @HCY123902 Yes it's the default Best, |
Thank you for your reply. I am trying to reproduce the results on GSM8K, and this is what I observed using
These results seem significantly higher than what is reported on Table 9 of the paper. Therefore, may I know your |
Hi @HCY123902, apologies for the delayed response! For testing GSM8k, we used the following command and evaluated with the git version
Let me know if you have any further questions! |
Hi @xiamengzhou @yumeng5, Regarding this issue, I have a quick question: Between the two evaluation metrics for GSM8K, strict-match and flexible-extract, which one would you prefer to highlight as the primary evaluation metric in your paper? Best, |
In Table 9 of the paper, your evaluation on GSM8K seems to use the 5 shot setting. May I know which evaluation library did you use? Is it
lm-evaluation-harness
or other existing GitHub implementationsThe text was updated successfully, but these errors were encountered: