Skip to content

Commit

Permalink
Merge branch 'main' into clem_bnb_gptq_config_bug
Browse files Browse the repository at this point in the history
  • Loading branch information
clefourrier authored Jul 17, 2024
2 parents efce6db + d43c9a3 commit 8fc0526
Show file tree
Hide file tree
Showing 12 changed files with 234 additions and 316 deletions.
35 changes: 21 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ pre-commit install

We provide two main entry points to evaluate models:

* `run_evals_accelerate.py`: evaluate models on CPU or one or more GPUs using [🤗 Accelerate](https://github.com/huggingface/accelerate).
* `run_evals_nanotron.py`: evaluate models in distributed settings using [⚡️ Nanotron](https://github.com/huggingface/nanotron).
* `lighteval accelerate`: evaluate models on CPU or one or more GPUs using [🤗 Accelerate](https://github.com/huggingface/accelerate).
* `lighteval nanotron`: evaluate models in distributed settings using [⚡️ Nanotron](https://github.com/huggingface/nanotron).

For most users, we recommend using the 🤗 Accelerate backend - see below for specific commands.

Expand All @@ -94,7 +94,8 @@ accelerate config
You can then evaluate a model using data parallelism as follows:

```shell
accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>" \
--tasks <task parameters> \
--output_dir output_dir
Expand All @@ -109,7 +110,8 @@ suite|task|num_few_shot|{0 or 1 to automatically reduce `num_few_shot` if prompt
or a file path like [`examples/tasks/recommended_set.txt`](./examples/tasks/recommended_set.txt) which specifies multiple task configurations. For example, to evaluate GPT-2 on the Truthful QA benchmark run:

```shell
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "lighteval|truthfulqa:mc|0|0" \
--override_batch_size 1 \
Expand All @@ -119,7 +121,8 @@ accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
Here, `--override_batch_size` defines the _batch size per device_, so the effective batch size will be `override_batch_size x num_gpus`. To evaluate on multiple benchmarks, separate each task configuration with a comma, e.g.

```shell
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "leaderboard|truthfulqa:mc|0|0,leaderboard|gsm8k|0|0" \
--override_batch_size 1 \
Expand All @@ -133,7 +136,8 @@ See the [`examples/tasks/recommended_set.txt`](./examples/tasks/recommended_set.
If you want to evaluate a model by spinning up inference endpoints, use adapter/delta weights, or more complex configuration options, you can load models using a configuration file. This is done as follows:

```shell
accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=<num_gpus> -m \
lighteval accelerate \
--model_config_path="<path to your model configuration>" \
--tasks <task parameters> \
--output_dir output_dir
Expand All @@ -147,13 +151,15 @@ To evaluate models larger that ~40B parameters in 16-bit precision, you will nee

```shell
# PP=2, DP=4 - good for models < 70B params
accelerate launch --multi_gpu --num_processes=4 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=4 -m \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>,model_parallel=True" \
--tasks <task parameters> \
--output_dir output_dir

# PP=4, DP=2 - good for huge models >= 70B params
accelerate launch --multi_gpu --num_processes=2 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=2 -m \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>,model_parallel=True" \
--tasks <task parameters> \
--output_dir output_dir
Expand All @@ -164,7 +170,8 @@ accelerate launch --multi_gpu --num_processes=2 run_evals_accelerate.py \
To evaluate a model on all the benchmarks of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) using a single node of 8 GPUs, run:

```shell
accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=<model name>" \
--tasks examples/tasks/open_llm_leaderboard_tasks.txt \
--override_batch_size 1 \
Expand All @@ -176,7 +183,7 @@ accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
You can also use `lighteval` to evaluate models on CPU, although note this will typically be very slow for large models. To do so, run:

```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>"\
--tasks <task parameters> \
--output_dir output_dir
Expand Down Expand Up @@ -211,7 +218,7 @@ Independently of the default tasks provided in `lighteval` that you will find in

For example, to run an extended task like `ifeval`, you can run:
```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
--tasks "extended|ifeval|0|0" \
Expand All @@ -221,7 +228,7 @@ python run_evals_accelerate.py \
To run a community or custom task, you can use (note the custom_tasks flag):

```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args="pretrained=<path to model on the hub>"\
--tasks <task parameters> \
--custom_tasks <path to your custom or community task> \
Expand All @@ -231,7 +238,7 @@ python run_evals_accelerate.py \
For example, to launch `lighteval` on `arabic_mmlu:abstract_algebra` for `HuggingFaceH4/zephyr-7b-beta`, run:

```shell
python run_evals_accelerate.py \
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
--tasks "community|arabic_mmlu:abstract_algebra|5|1" \
Expand Down Expand Up @@ -464,7 +471,7 @@ source <path_to_your_venv>/activate #or conda activate yourenv
cd <path_to_your_lighteval>/lighteval

export CUDA_LAUNCH_BLOCKING=1
srun accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
srun accelerate launch --multi_gpu --num_processes=8 -m lighteval accelerate --model_args "pretrained=your model name" --tasks examples/tasks/open_llm_leaderboard_tasks.txt --override_batch_size 1 --save_details --output_dir=your output dir
```

## Releases
Expand Down
2 changes: 1 addition & 1 deletion examples/model_configs/endpoint_model.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ model:
endpoint_type: "protected"
namespace: null # The namespace under which to launch the endopint. Defaults to the current user's namespace
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
env_vars:
env_vars:
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
generation:
add_special_tokens: true
2 changes: 1 addition & 1 deletion examples/model_configs/tgi_model.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ model:
instance:
inference_server_address: ""
inference_server_auth: null
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -102,4 +102,4 @@ Issues = "https://github.com/huggingface/lighteval/issues"
# Changelog = "https://github.com/huggingface/lighteval/blob/master/CHANGELOG.md"

[project.scripts]
lighteval = "lighteval.commands.lighteval_cli:main"
lighteval = "lighteval.__main__:cli_evaluate"
89 changes: 0 additions & 89 deletions run_evals_accelerate.py

This file was deleted.

55 changes: 0 additions & 55 deletions run_evals_nanotron.py

This file was deleted.

65 changes: 65 additions & 0 deletions src/lighteval/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/usr/bin/env python

# MIT License

# Copyright (c) 2024 Taratra D. RAHARISON and The HuggingFace Team

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import argparse

from lighteval.parsers import parser_accelerate, parser_nanotron
from lighteval.tasks.registry import Registry


def cli_evaluate():
parser = argparse.ArgumentParser(description="CLI tool for lighteval, a lightweight framework for LLM evaluation")
subparsers = parser.add_subparsers(help="help for subcommand", dest="subcommand")

# create the parser for the "accelerate" command
parser_a = subparsers.add_parser("accelerate", help="use accelerate and transformers as backend for evaluation.")
parser_accelerate(parser_a)

# create the parser for the "nanotron" command
parser_b = subparsers.add_parser("nanotron", help="use nanotron as backend for evaluation.")
parser_nanotron(parser_b)

parser.add_argument("--list-tasks", action="store_true", help="List available tasks")

args = parser.parse_args()

if args.subcommand == "accelerate":
from lighteval.main_accelerate import main as main_accelerate

main_accelerate(args)
return

if args.subcommand == "nanotron":
from lighteval.main_nanotron import main as main_nanotron

main_nanotron(args.checkpoint_config_path, args.lighteval_override, args.cache_dir)
return

if args.list_tasks:
Registry(cache_dir="").print_all_tasks()
return


if __name__ == "__main__":
cli_evaluate()
Loading

0 comments on commit 8fc0526

Please sign in to comment.