You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors, thank you for the amazing work and sharing your code and data!
I wanted to ask about your evaluation code, as currently if the model outputs an answer with decimal point, it automatically rounds to the nearest integer.
In this way, a wrong answer (i.e. 8.5) could be considered correct (i.e. as 9), in spite of a calculation error, which indeed often occurs with some model generations.
In this light, I believe a stricter evaluation code may be needed.
The text was updated successfully, but these errors were encountered:
Dear authors, thank you for the amazing work and sharing your code and data!
I wanted to ask about your evaluation code, as currently if the model outputs an answer with decimal point, it automatically rounds to the nearest integer.
In this way, a wrong answer (i.e. 8.5) could be considered correct (i.e. as 9), in spite of a calculation error, which indeed often occurs with some model generations.
In this light, I believe a stricter evaluation code may be needed.
The text was updated successfully, but these errors were encountered: