Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Simplified Coding Tasks #645

Merged
merged 28 commits into from
Oct 11, 2023
Merged

Conversation

mcarbin
Copy link
Contributor

@mcarbin mcarbin commented Oct 3, 2023

Ran coding tasks on 3b model: coding-eval-Tzea5x

| Category   | Benchmark                 | Subtask   |   Accuracy | Number few shot   | Model        |
|:-----------|:--------------------------|:----------|-----------:|:------------------|:-------------|
|            | human_eval                |           | 0.00762195 | 0-shot            | mosaicml/30b |
|            | human_eval_cpp            |           | 0.00341615 | 0-shot            | mosaicml/30b |
|            | human_eval_js             |           | 0.00640244 | 0-shot            | mosaicml/30b |
|            | human_eval_return_simple  |           | 0.210811   | 0-shot            | mosaicml/30b |
|            | human_eval_return_complex |           | 0.0330709  | 0-shot            | mosaicml/30b |
|            | human_eval_25             |           | 0.00579268 | 0-shot            | mosaicml/30b |
|            | human_eval_50             |           | 0.017378   | 0-shot            | mosaicml/30b |
|            | human_eval_75             |           | 0.0646342  | 0-shot            | mosaicml/30b |

@bmosaicml
Copy link
Contributor

Rather than create a new yaml describing these tasks, would you mind just adding them to the existing yamls alongside the original human eval datasets? namely the tasks.yaml file

@bmosaicml bmosaicml requested a review from codestar12 October 3, 2023 21:42
@bmosaicml
Copy link
Contributor

bmosaicml commented Oct 3, 2023

@codestar12 I'd like to kick off an eval of this, do you have a 3B model I could run it against?

@bmosaicml bmosaicml requested a review from samhavens October 9, 2023 17:10
@samhavens
Copy link
Contributor

Since these are new tasks that we came up with, is there anywhere we could give a short description of what the .25, return-simple etc mean?

Copy link
Contributor

@samhavens samhavens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding links to the more obscure data sets

@bmosaicml bmosaicml merged commit cdb1c28 into mosaicml:main Oct 11, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants