Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incoherent models in blocks of range [3 999 081, 3 999 227] #73

Closed
apoplexyes opened this issue Oct 8, 2024 · 3 comments
Closed

Incoherent models in blocks of range [3 999 081, 3 999 227] #73

apoplexyes opened this issue Oct 8, 2024 · 3 comments

Comments

@apoplexyes
Copy link

I've tested the model crybit/role_172840312514 on local.
The score was
vibe_score : 0.2573
latency_score : 0.9118
coherence_score : 0.9961
creativity_score : 0.7425
model_size_score : 0.8373
qualitative_score : 0.3953

But in the leaderboard, it is detected as incoherent model.
I think an issue was occured all of a sudden.
Please check it.

@donaldknoller
Copy link
Contributor

This is the status of models submitted between 3999081 and 3999227:

status hash notes block repo_namespace coherence_score
COMPLETED 17050060913830473142 Incoherent model submitted 3999095 rockdrop 0
COMPLETED 15847721952423487803 Incoherent model submitted 3999095 rockdrop 0
COMPLETED 8835647222478815873 Incoherent model submitted 3999098 rockdrop 0
COMPLETED 7972011408484919977 Incoherent model submitted 3999102 rockdrop 0
COMPLETED 13477148470819370782 Incoherent model submitted 3999108 irusl 0
COMPLETED 14119649159871166031 Incoherent model submitted 3999112 rockdrop 0
COMPLETED 828694671848585522 Incoherent model submitted 3999140 rockdrop 0
COMPLETED 10070964153989073139 Incoherent model submitted 3999144 crybit 0
COMPLETED 7685907597803088846 3999163 crybit 0.99609375
COMPLETED 6897270142212053831 3999176 whizzzzkid 0.9765625
COMPLETED 1192291879426447438 Incoherent model submitted 3999176 aks1s 0
COMPLETED 6897270142212053831 3999178 whizzzzkid 0.9765625
COMPLETED 6897270142212053831 3999182 whizzzzkid 0.9765625
COMPLETED 10327789119060379515 3999192 crybit 0.9765625
COMPLETED 10327789119060379515 3999194 crybit 0.9765625

From initial observations, there does not seem to be an explicit issue with the coherence score testing outside of potentially higher variability.
The current implementation of the coherence score (which has been running for many epochs without issue) uses 256 samples and calls OpenAI for judging coherency in generated text. There are some known parameters that could use improvement, which is already outlined in another issue here:
#64

Thus, we will close this issue in favor of highlighting the more specific domain of reducing variability for the coherence score. There will be no adjustment to these specific submitted models at this time.

@apoplexyes
Copy link
Author

apoplexyes commented Oct 8, 2024

In my experience, coherence score wouldn't be 0 even though the model is completely incoherent.
The case why the coherence score is 0 is when there's a problem with api call.

I think this issue is an urgent issue, because the models that are evaluated as incoherent are dereged almost because of their 0 score.

@apoplexyes
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants