Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lm-eval-harness colab #505

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions colabs/lm-eval-harness/lm-eval-harness.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "d6YuliF8qfsy"
},
"source": [
"<img src=\"https://wandb.me/logo-im-png\" width=\"400\" alt=\"Weights & Biases\" />\n",
"<!--- @wandbcode{lm-eval-harness} -->\n",
"\n",
"# Visualizing Results in Weights and Biases\n",
"\n",
"<!--- @wandbcode{lm-eval-harness-colab} -->\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/lm-eval-harness/lm-eval-harness.ipynb)\n",
"\n",
"With the Weights and Biases integration, you can now spend more time extracting deeper insights into your evaluation results. The integration is designed to streamline the process of logging and visualizing experiment results using the Weights & Biases (W&B) platform.\n",
"\n",
"The integration provide functionalities\n",
"\n",
"- to automatically log the evaluation results,\n",
"- log the samples as W&B Tables for easy visualization,\n",
"- log the `results.json` file as an artifact for version control,\n",
"- log the `<task_name>_eval_samples.json` file if the samples are logged,\n",
"- generate a comprehensive report for analysis and visualization with all the important metric,\n",
"- log task and cli configs,\n",
"- and more out of the box like the command used to run the evaluation, GPU/CPU counts, timestamp, etc.\n",
"\n",
"The integration is super easy to use with the eval harness. Let's see how!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tJRBdlTrqgVg"
},
"outputs": [],
"source": [
"# Install this project if you did not already have it.\n",
"# This is all that is needed to be installed to start using Weights and Biases\n",
"\n",
"!git clone https://github.com/EleutherAI/lm-evaluation-harness\n",
"%cd lm-evaluation-harness\n",
"!pip -qq install -e .[wandb]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Eb4eGgJIxCki"
},
"outputs": [],
"source": [
"# Getting an error `no module named transformers.cache_utils` while running eval.\n",
"# Installing transformers from the repo is solving the issue.\n",
"# Ref: https://huggingface.co/DiscoResearch/mixtral-7b-8expert/discussions/9#6576edcd0370e52e3b2c0620\n",
"!pip uninstall -y transformers\n",
"!pip install git+https://github.com/huggingface/transformers"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5c-t1T2vrxQw"
},
"source": [
"# Run the Eval Harness\n",
"\n",
"Run the eval harness as usual with a `wandb_args` flag. This flag is used to provide arguments for initializing a wandb run ([wandb.init](https://docs.wandb.ai/ref/python/init)) as comma separated string arguments.\n",
"\n",
"If `wandb_args` flag is used, the metrics and all other goodness will be automatically logged to Weights and Biases. In the stdout, you will find the link to the W&B run page as well as link to the generated report."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YkeORhFXry8o"
},
"source": [
"## Set your API Key\n",
"\n",
"Before you can use W&B, you need to authenticate your machine with an authentication key. Visit https://wandb.ai/authorize to get one."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "E7YTd5OUrpnO"
},
"outputs": [],
"source": [
"import wandb\n",
"wandb.login()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_L5yCLtGt5Tu"
},
"source": [
"> Note that if you are using command line you can simply authrnticate your machine by doing `wandb login` in your terminal. For more info check out the [documentation](https://docs.wandb.ai/quickstart#2-log-in-to-wb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "R4wGEB7ot7OZ"
},
"source": [
"## Run and log to W&B"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fkMGw712sg8f"
},
"outputs": [],
"source": [
"!lm_eval \\\n",
" --model hf \\\n",
" --model_args pretrained=microsoft/phi-2,trust_remote_code=True \\\n",
" --tasks hellaswag,mmlu_abstract_algebra \\\n",
" --device cuda:0 \\\n",
" --batch_size 8 \\\n",
" --output_path output/phi-2 \\\n",
" --limit 10 \\\n",
" --wandb_args project=lm-eval-harness-integration \\\n",
" --log_samples"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "V100",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading