We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I evaluated polyglot-ko-1.3b model with HellaSwag and WiC from KoBEST, and I got different results with paper and model card from huggingface.
polyglot-ko-1.3b
I'm going to share a notebook that I tested with. https://colab.research.google.com/drive/1lyQQisuB5JzuGk72haSdxXfXP20q4YGr?usp=sharing
The paper says the score 0.486, But I got only 0.4541.
hf-causal-experimental (pretrained=EleutherAI/polyglot-ko-1.3b), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
The paper says the score 0.526, But I got only 0.3984.
And I found out a Wandb Report Polyglot-Ko: Open-Source Korean Autoregressive Language Model , And there's a HellaSwag score that is same as my test, 0.3984.
0.3984
There are also differences in kakaobrain/kogpt and skt/ko-gpt-trinity-1.2B-v0.5.
kakaobrain/kogpt
skt/ko-gpt-trinity-1.2B-v0.5
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I evaluated
polyglot-ko-1.3b
model with HellaSwag and WiC from KoBEST, and I got different results with paper and model card from huggingface.Environment
I'm going to share a notebook that I tested with.
https://colab.research.google.com/drive/1lyQQisuB5JzuGk72haSdxXfXP20q4YGr?usp=sharing
1. WiC
The paper says the score 0.486, But I got only 0.4541.
hf-causal-experimental (pretrained=EleutherAI/polyglot-ko-1.3b), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
2. HellaSwag
The paper says the score 0.526, But I got only 0.3984.
hf-causal-experimental (pretrained=EleutherAI/polyglot-ko-1.3b), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
And I found out a Wandb Report Polyglot-Ko: Open-Source Korean Autoregressive Language Model
, And there's a HellaSwag score that is same as my test,
0.3984
.In case of other models
There are also differences in
kakaobrain/kogpt
andskt/ko-gpt-trinity-1.2B-v0.5
.Note that I tested kakaobrain/kogpt with Int 8 quantized model.
The text was updated successfully, but these errors were encountered: