wandb · ayulockin · Feb 8, 2024 · Feb 8, 2024 · Jun 13, 2024
diff --git a/colabs/lm-eval-harness/lm-eval-harness.ipynb b/colabs/lm-eval-harness/lm-eval-harness.ipynb
@@ -0,0 +1,155 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "d6YuliF8qfsy"
+      },
+      "source": [
+        "<img src=\"https://wandb.me/logo-im-png\" width=\"400\" alt=\"Weights & Biases\" />\n",
+        "<!--- @wandbcode{lm-eval-harness} -->\n",
+        "\n",
+        "# Visualizing Results in Weights and Biases\n",
+        "\n",
+        "<!--- @wandbcode{lm-eval-harness-colab} -->\n",
+        "\n",
+        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/lm-eval-harness/lm-eval-harness.ipynb)\n",
+        "\n",
+        "With the Weights and Biases integration, you can now spend more time extracting deeper insights into your evaluation results. The integration is designed to streamline the process of logging and visualizing experiment results using the Weights & Biases (W&B) platform.\n",
+        "\n",
+        "The integration provide functionalities\n",
+        "\n",
+        "- to automatically log the evaluation results,\n",
+        "- log the samples as W&B Tables for easy visualization,\n",
+        "- log the `results.json` file as an artifact for version control,\n",
+        "- log the `<task_name>_eval_samples.json` file if the samples are logged,\n",
+        "- generate a comprehensive report for analysis and visualization with all the important metric,\n",
+        "- log task and cli configs,\n",
+        "- and more out of the box like the command used to run the evaluation, GPU/CPU counts, timestamp, etc.\n",
+        "\n",
+        "The integration is super easy to use with the eval harness. Let's see how!"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "tJRBdlTrqgVg"
+      },
+      "outputs": [],
+      "source": [
+        "# Install this project if you did not already have it.\n",
+        "# This is all that is needed to be installed to start using Weights and Biases\n",
+        "\n",
+        "!git clone https://github.com/EleutherAI/lm-evaluation-harness\n",
+        "%cd lm-evaluation-harness\n",
+        "!pip -qq install -e .[wandb]"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Eb4eGgJIxCki"
+      },
+      "outputs": [],
+      "source": [
+        "# Getting an error `no module named transformers.cache_utils` while running eval.\n",
+        "# Installing transformers from the repo is solving the issue.\n",
+        "# Ref: https://huggingface.co/DiscoResearch/mixtral-7b-8expert/discussions/9#6576edcd0370e52e3b2c0620\n",
+        "!pip uninstall -y transformers\n",
+        "!pip install git+https://github.com/huggingface/transformers"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "5c-t1T2vrxQw"
+      },
+      "source": [
+        "# Run the Eval Harness\n",
+        "\n",
+        "Run the eval harness as usual with a `wandb_args` flag. This flag is used to provide arguments for initializing a wandb run ([wandb.init](https://docs.wandb.ai/ref/python/init)) as comma separated string arguments.\n",
+        "\n",
+        "If `wandb_args` flag is used, the metrics and all other goodness will be automatically logged to Weights and Biases. In the stdout, you will find the link to the W&B run page as well as link to the generated report."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YkeORhFXry8o"
+      },
+      "source": [
+        "## Set your API Key\n",
+        "\n",
+        "Before you can use W&B, you need to authenticate your machine with an authentication key. Visit https://wandb.ai/authorize to get one."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "E7YTd5OUrpnO"
+      },
+      "outputs": [],
+      "source": [
+        "import wandb\n",
+        "wandb.login()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "_L5yCLtGt5Tu"
+      },
+      "source": [
+        "> Note that if you are using command line you can simply authrnticate your machine by doing `wandb login` in your terminal. For more info check out the [documentation](https://docs.wandb.ai/quickstart#2-log-in-to-wb)."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "R4wGEB7ot7OZ"
+      },
+      "source": [
+        "## Run and log to W&B"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "fkMGw712sg8f"
+      },
+      "outputs": [],
+      "source": [
+        "!lm_eval \\\n",
+        "    --model hf \\\n",
+        "    --model_args pretrained=microsoft/phi-2,trust_remote_code=True \\\n",
+        "    --tasks hellaswag,mmlu_abstract_algebra \\\n",
+        "    --device cuda:0 \\\n",
+        "    --batch_size 8 \\\n",
+        "    --output_path output/phi-2 \\\n",
+        "    --limit 10 \\\n",
+        "    --wandb_args project=lm-eval-harness-integration \\\n",
+        "    --log_samples"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "V100",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}