Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tessa/callibration script #937

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

tbarton16
Copy link
Contributor

@tbarton16 tbarton16 commented Feb 2, 2024

Here is code we use to test our benchmark tasks by using a series of progressively more advanced models to see if the benchmarks effectively differentiate between them, and at which number of shots they performed best at.

  • Select an independent variable and a series of models that correspond to the settings of that variable
  • Select clusters
  • Edit the list of tasks in the base_callibration.yaml to reflect the ones you want to see
  • Run the launcher script
  • When everything is done, run the analyze_output notebook which collates the results from wandb

@maxisawesome
Copy link
Contributor

lgtm! I kinda hate checking in notebooks but I do think it's better than a script in this case.

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this notebook/script is mostly for y'all, has lots of hardcoded stuff, etc, lets note in the README that the calibration scripts are experimental and subject to change at any time.

@@ -0,0 +1,10 @@
# Callibration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Callibration
# Calibration

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throughout

A good benchmark is one that clearly shows which models are better and which are worse. We test our benchmark tasks by using a series of progressively more advanced models to see if the benchmarks effectively differentiate between them, and at which number of shots they performed best at.

To run the code:
* Select an independant variable and a series of models that correspond to the settings of that variable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Select an independant variable and a series of models that correspond to the settings of that variable
* Select an independent variable and a series of models that correspond to the settings of that variable

throughout

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest clearing the cell output before committing the notebook.

Easiest thing might be to just add the precommit hook from composer for this

- repo: https://github.com/kynan/nbstripout
  rev: 0.5.0
  hooks:
  - id: nbstripout
    types:
    - "jupyter"
    args:
    # Strip all the metadata that vscode or colab may add to a notebook
    - --strip-empty-cells
    - --extra-keys
    - >
      metadata.colab metadata.interpreter metadata.accelerator
      metadata.kernelspec metadata.language_info.version
      cell.metadata.heading_collapsed metadata.name metadata.nbconvert_exporter
      metadata.version metadata.vscode

integrations:
- integration_type: git_repo
git_repo: mosaicml/llm-foundry
git_branch: main
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be pinned to a release.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this to the eval/yamls folder

@bmosaicml
Copy link
Contributor

Would you mind adding the MCLI name of a test run you launched so I can go back and describe run and view logs later?

Additionally a screenshot of the resulting notebook would be good so that when I go back to this later I can confirm that I got the correct results?

Copy link
Contributor

@bmosaicml bmosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! can you just address Daniel's comments as well as update the description as I requested?

Thx Tessa!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants