Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tessa/callibration script #937

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions scripts/eval/callibration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Callibration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Callibration
# Calibration

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throughout


A good benchmark is one that clearly shows which models are better and which are worse. We test our benchmark tasks by using a series of progressively more advanced models to see if the benchmarks effectively differentiate between them, and at which number of shots they performed best at.

To run the code:
* Select an independant variable and a series of models that correspond to the settings of that variable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Select an independant variable and a series of models that correspond to the settings of that variable
* Select an independent variable and a series of models that correspond to the settings of that variable

throughout

* Select clusters
* Edit the list of tasks in the base_callibration.yaml to reflect the ones you want to see
* Run the launcher script
* When everything is done, run the analyze_output notebook which collates the results from wandb
Loading
Loading