We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I tried to eval the Llama-3-Instruct-8B-SimPO-v0.2 checkpoint by arena-hard-auto, and I only got
Llama-3-Instruct-8B-SimPO-v0.2 | score: 35.4 | 95% CI: (-3.2, 2.0) | average #tokens: 530
while your paper reported 36.5
So I am wondering if my vllm api server setting is right:
python3 -m vllm.entrypoints.openai.api_server \ --model path-to-SimPO-v0.2 \ --host 0.0.0.0 --port 5001 --served-model-name SimPO-v0.2 \ --chat-template templates/llama3.jinja
The text was updated successfully, but these errors were encountered:
I have checked that there is no '<|eot_id|>' in the end of generated answers
Sorry, something went wrong.
I found there is an update on questions.jsonl from arena-hard 5 months ago, don't know if it is the reason: lmarena/arena-hard-auto@d989e6f#diff-9a6dd9530bef3f149817dfb224c99c9d6432597c11a9ce88ffe220ad61c201fb
Hi @jimmy19991222
I think your result is reasonably close to our reported one (a ~1 point difference probably can be attributed to randomness).
Best, Yu
No branches or pull requests
Hi, I tried to eval the Llama-3-Instruct-8B-SimPO-v0.2 checkpoint by arena-hard-auto, and I only got
Llama-3-Instruct-8B-SimPO-v0.2 | score: 35.4 | 95% CI: (-3.2, 2.0) | average #tokens: 530
while your paper reported 36.5
So I am wondering if my vllm api server setting is right:
The text was updated successfully, but these errors were encountered: