Potential error in eval_gsm8k.py #23

hbin0701 · 2023-12-27T08:12:09Z

Dear authors, thank you for the amazing work and sharing your code and data!

I wanted to ask about your evaluation code, as currently if the model outputs an answer with decimal point, it automatically rounds to the nearest integer.

In this way, a wrong answer (i.e. 8.5) could be considered correct (i.e. as 9), in spite of a calculation error, which indeed often occurs with some model generations.

In this light, I believe a stricter evaluation code may be needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential error in eval_gsm8k.py #23

Potential error in eval_gsm8k.py #23

hbin0701 commented Dec 27, 2023

Potential error in eval_gsm8k.py #23

Potential error in eval_gsm8k.py #23

Comments

hbin0701 commented Dec 27, 2023