Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in Mixtral accuracy #297

Merged
merged 1 commit into from
Jul 2, 2024
Merged

Fix typo in Mixtral accuracy #297

merged 1 commit into from
Jul 2, 2024

Conversation

pgmpablo157321
Copy link
Contributor

No description provided.

@pgmpablo157321 pgmpablo157321 requested a review from a team as a code owner July 1, 2024 17:13
Copy link

github-actions bot commented Jul 1, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link

@szutenberg szutenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typo correction is fine but in these rules I see few discrepancies:

  • OpenOrca has only train spit, there is no "GPT-4 split": The OpenOrca dataset is a collection of augmented FLAN Collection data. Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions.
  • we used GSM8K samples from the train split (train has 7.47k, test 1.32k)
  • MBXP has only test split - no need to specify
  • no need to duplicate "5k samples, max_seq_len=2048". Consider writing it in a simpler form: "curated dataset containing 5k samples from GSM8K, OpenOrca, MBXP each, max_seq_len=2048
  • tokens per sample should be 145.9
  • there is only one accuracy category: 99%. Reference outputs are for fp16.
  • rouge scores are for OpenOrca samples only

@mrmhodak
Copy link
Contributor

mrmhodak commented Jul 1, 2024

We can merge during WG meeting 7/2.

@szutenberg : Can you create a PR for the discrepancies you found?

@mrmhodak mrmhodak merged commit f892cf0 into master Jul 2, 2024
1 check passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 2, 2024
@mrmhodak
Copy link
Contributor

mrmhodak commented Jul 2, 2024

@pgmpablo157321 to create PR

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants