Incoherent models in blocks of range [3 999 081, 3 999 227] #73

apoplexyes · 2024-10-08T19:03:08Z

I've tested the model crybit/role_172840312514 on local.
The score was
vibe_score : 0.2573
latency_score : 0.9118
coherence_score : 0.9961
creativity_score : 0.7425
model_size_score : 0.8373
qualitative_score : 0.3953

But in the leaderboard, it is detected as incoherent model.
I think an issue was occured all of a sudden.
Please check it.

The text was updated successfully, but these errors were encountered:

donaldknoller · 2024-10-08T19:27:03Z

This is the status of models submitted between 3999081 and 3999227:

status	hash	notes	block	repo_namespace	coherence_score
COMPLETED	17050060913830473142	Incoherent model submitted	3999095	rockdrop	0
COMPLETED	15847721952423487803	Incoherent model submitted	3999095	rockdrop	0
COMPLETED	8835647222478815873	Incoherent model submitted	3999098	rockdrop	0
COMPLETED	7972011408484919977	Incoherent model submitted	3999102	rockdrop	0
COMPLETED	13477148470819370782	Incoherent model submitted	3999108	irusl	0
COMPLETED	14119649159871166031	Incoherent model submitted	3999112	rockdrop	0
COMPLETED	828694671848585522	Incoherent model submitted	3999140	rockdrop	0
COMPLETED	10070964153989073139	Incoherent model submitted	3999144	crybit	0
COMPLETED	7685907597803088846		3999163	crybit	0.99609375
COMPLETED	6897270142212053831		3999176	whizzzzkid	0.9765625
COMPLETED	1192291879426447438	Incoherent model submitted	3999176	aks1s	0
COMPLETED	6897270142212053831		3999178	whizzzzkid	0.9765625
COMPLETED	6897270142212053831		3999182	whizzzzkid	0.9765625
COMPLETED	10327789119060379515		3999192	crybit	0.9765625
COMPLETED	10327789119060379515		3999194	crybit	0.9765625

From initial observations, there does not seem to be an explicit issue with the coherence score testing outside of potentially higher variability.
The current implementation of the coherence score (which has been running for many epochs without issue) uses 256 samples and calls OpenAI for judging coherency in generated text. There are some known parameters that could use improvement, which is already outlined in another issue here:
#64

Thus, we will close this issue in favor of highlighting the more specific domain of reducing variability for the coherence score. There will be no adjustment to these specific submitted models at this time.

apoplexyes · 2024-10-08T23:15:00Z

In my experience, coherence score wouldn't be 0 even though the model is completely incoherent.
The case why the coherence score is 0 is when there's a problem with api call.

I think this issue is an urgent issue, because the models that are evaluated as incoherent are dereged almost because of their 0 score.

apoplexyes · 2024-10-08T23:16:40Z

donaldknoller closed this as completed Oct 8, 2024

donaldknoller mentioned this issue Oct 9, 2024

The models would be evaluated as incoherent. #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incoherent models in blocks of range [3 999 081, 3 999 227] #73

Incoherent models in blocks of range [3 999 081, 3 999 227] #73

apoplexyes commented Oct 8, 2024

donaldknoller commented Oct 8, 2024

apoplexyes commented Oct 8, 2024 •

edited

Loading

apoplexyes commented Oct 8, 2024

Incoherent models in blocks of range [3 999 081, 3 999 227] #73

Incoherent models in blocks of range [3 999 081, 3 999 227] #73

Comments

apoplexyes commented Oct 8, 2024

donaldknoller commented Oct 8, 2024

apoplexyes commented Oct 8, 2024 • edited Loading

apoplexyes commented Oct 8, 2024

apoplexyes commented Oct 8, 2024 •

edited

Loading