Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redo logging #415

Merged
merged 39 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
8bda6bb
adding closed source models
NathanHB Nov 21, 2024
8122fb8
refacto CLI
NathanHB Nov 26, 2024
8e6e615
use correct parallelism manager for each model we use
NathanHB Nov 26, 2024
e85d31a
adds typer
NathanHB Nov 26, 2024
f6e18e8
adds typer
NathanHB Nov 29, 2024
372b89b
use typer as cli tool
NathanHB Nov 29, 2024
94031db
redo logging
NathanHB Nov 29, 2024
8c7f67c
redo logging
NathanHB Dec 2, 2024
c18f1be
lazy load rouge scorer
NathanHB Dec 2, 2024
4960dd5
fixes
NathanHB Dec 2, 2024
a3ac7a3
change log level of missing task
NathanHB Dec 2, 2024
00f3962
remove unused variable
NathanHB Dec 2, 2024
1360abe
fixes
NathanHB Dec 2, 2024
e99c268
fix from review
NathanHB Dec 3, 2024
85d2ef0
Merge branch 'main' into nathan-refacto-cli
NathanHB Dec 3, 2024
e950828
remove uneeded files
NathanHB Dec 3, 2024
7864c6b
Merge branch 'nathan-refacto-cli' of github.com:huggingface/lighteval…
NathanHB Dec 3, 2024
f989ed8
add typer to deps
NathanHB Dec 3, 2024
2c2748c
fix docs
NathanHB Dec 3, 2024
a80a1db
fix docs
NathanHB Dec 3, 2024
ae4caba
fix docs
NathanHB Dec 3, 2024
a79453a
fix tests
NathanHB Dec 3, 2024
3481562
fix tests
NathanHB Dec 3, 2024
b0ca7f1
fix tests
NathanHB Dec 3, 2024
39ba282
fix tests
NathanHB Dec 3, 2024
9110d96
Update src/lighteval/metrics/metrics_sample.py
NathanHB Dec 3, 2024
39d70a5
Merge branch 'nathan-refacto-cli' into nathan-refacto-logging
NathanHB Dec 3, 2024
7b9ab20
Merge branch 'nathan-refacto-logging' of github.com:huggingface/light…
NathanHB Dec 3, 2024
7339568
Update src/lighteval/metrics/metrics_sample.py
NathanHB Dec 3, 2024
6ef4e81
Merge remote-tracking branch 'origin/main' into nathan-refacto-logging
NathanHB Dec 4, 2024
5280fc0
fix dependencies
NathanHB Dec 4, 2024
1c30dec
rm hirarchical logger file
NathanHB Dec 4, 2024
38ce291
fix logging level
NathanHB Dec 4, 2024
4a1b94a
fix readme
NathanHB Dec 4, 2024
2e55920
fix dependencies and readme
NathanHB Dec 4, 2024
c96c6ca
Update src/lighteval/pipeline.py
NathanHB Dec 5, 2024
051f5e2
Update src/lighteval/tasks/registry.py
NathanHB Dec 5, 2024
aac6c82
Merge branch 'main' into nathan-refacto-logging
NathanHB Dec 5, 2024
f1013bb
fix styling
NathanHB Dec 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ jobs:
- name: Test
env:
HF_TEST_TOKEN: ${{ secrets.HF_TEST_TOKEN }}
HF_HOME: "cache/models"
HF_DATASETS_CACHE: "cache/datasets"
run: | # PYTHONPATH="${PYTHONPATH}:src" HF_DATASETS_CACHE="cache/datasets" HF_HOME="cache/models"
python -m pytest --disable-pytest-warnings
- name: Write cache
Expand Down
7 changes: 3 additions & 4 deletions docs/source/adding-a-custom-task.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,7 @@ Once your file is created you can then run the evaluation with the following com

```bash
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--tasks "community|{custom_task}|{fewshots}|{truncate_few_shot}" \
--custom_tasks {path_to_your_custom_task_file} \
--output_dir "./evals"
"pretrained=HuggingFaceH4/zephyr-7b-beta" \
"community|{custom_task}|{fewshots}|{truncate_few_shot}" \
--custom-tasks {path_to_your_custom_task_file} \
```
8 changes: 7 additions & 1 deletion docs/source/available-tasks.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@
You can get a list of all the available tasks by running:

```bash
lighteval tasks --list
lighteval tasks list
```

You can also inspect a specific task by running:

```bash
lighteval tasks inspect <task_name>
```

## List of tasks
Expand Down
23 changes: 19 additions & 4 deletions docs/source/evaluate-the-model-on-a-server-or-container.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@ to the server. The command is the same as before, except you specify a path to
a yaml config file (detailed below):

```bash
lighteval accelerate \
--model_config_path="/path/to/config/file"\
--tasks <task parameters> \
--output_dir output_dir
lighteval endpoint {tgi,inference-endpoint} \
"/path/to/config/file"\
<task parameters>
```

There are two types of configuration files that can be provided for running on
Expand Down Expand Up @@ -65,3 +64,19 @@ model:
inference_server_auth: null
model_id: null # Optional, only required if the TGI container was launched with model_id pointing to a local directory
```

### OpenAI API

Lighteval also supports evaluating models on the OpenAI API. To do so you need to set your OpenAI API key in the environment variable.

```bash
export OPENAI_API_KEY={your_key}
```

And then run the following command:

```bash
lighteval endpoint openai \
{model-name} \
<task parameters>
```
2 changes: 1 addition & 1 deletion docs/source/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ backends—whether it's
[transformers](https://github.com/huggingface/transformers),
[tgi](https://github.com/huggingface/text-generation-inference),
[vllm](https://github.com/vllm-project/vllm), or
[nanotron](https://github.com/huggingface/nanotron)with
[nanotron](https://github.com/huggingface/nanotron) with
ease. Dive deep into your model’s performance by saving and exploring detailed,
sample-by-sample results to debug and see how your models stack-up.

Expand Down
2 changes: 0 additions & 2 deletions docs/source/package_reference/model_config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,3 @@
[[autodoc]] models.model_config.InferenceModelConfig
[[autodoc]] models.model_config.TGIModelConfig
[[autodoc]] models.model_config.VLLMModelConfig

[[autodoc]] models.model_config.create_model_config
39 changes: 23 additions & 16 deletions docs/source/quicktour.mdx
Original file line number Diff line number Diff line change
@@ -1,11 +1,24 @@
# Quicktour

We provide two main entry points to evaluate models:

> [!TIP]
> We recommend using the `--help` flag to get more information about the
> available options for each command.
> `lighteval --help` and `lighteval accelerate --help`

Lighteval can be used with a few different commands.

- `lighteval accelerate` : evaluate models on CPU or one or more GPUs using [🤗
Accelerate](https://github.com/huggingface/accelerate)
- `lighteval nanotron`: evaluate models in distributed settings using [⚡️
Nanotron](https://github.com/huggingface/nanotron)
- `lighteval vllm`: evaluate models on one or more GPUs using [🚀
VLLM](https://github.com/vllm-project/vllm)
- `lighteval endpoint`
- `inference-endpoint`: evaluate models on one or more GPUs using [🔗
Inference Endpoint](https://huggingface.co/inference-endpoints/dedicated)
- `tgi`: evaluate models on one or more GPUs using [🔗 Text Generation Inference](https://huggingface.co/docs/text-generation-inference/en/index)
- `openai`: evaluate models on one or more GPUs using [🔗 OpenAI API](https://platform.openai.com/)

## Accelerate

Expand All @@ -15,10 +28,8 @@ To evaluate `GPT-2` on the Truthful QA benchmark, run:

```bash
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
"pretrained=gpt2" \
"leaderboard|truthfulqa:mc|0|0"
```

Here, `--tasks` refers to either a comma-separated list of supported tasks from
Expand Down Expand Up @@ -51,10 +62,8 @@ You can then evaluate a model using data parallelism on 8 GPUs like follows:
```bash
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
--model_args "pretrained=gpt2" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
"pretrained=gpt2" \
"leaderboard|truthfulqa:mc|0|0"
```

Here, `--override_batch_size` defines the batch size per device, so the effective
Expand All @@ -66,10 +75,8 @@ To evaluate a model using pipeline parallelism on 2 or more GPUs, run:

```bash
lighteval accelerate \
--model_args "pretrained=gpt2,model_parallel=True" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--override_batch_size 1 \
--output_dir="./evals/"
"pretrained=gpt2,model_parallel=True" \
"leaderboard|truthfulqa:mc|0|0"
```

This will automatically use accelerate to distribute the model across the GPUs.
Expand All @@ -81,7 +88,7 @@ GPUs.

### Model Arguments

The `--model_args` argument takes a string representing a list of model
The `model-args` argument takes a string representing a list of model
argument. The arguments allowed vary depending on the backend you use (vllm or
accelerate).

Expand Down Expand Up @@ -150,8 +157,8 @@ To evaluate a model trained with nanotron on a single gpu.
```bash
torchrun --standalone --nnodes=1 --nproc-per-node=1 \
src/lighteval/__main__.py nanotron \
--checkpoint_config_path ../nanotron/checkpoints/10/config.yaml \
--lighteval_config_path examples/nanotron/lighteval_config_override_template.yaml
--checkpoint-config-path ../nanotron/checkpoints/10/config.yaml \
--lighteval-config-path examples/nanotron/lighteval_config_override_template.yaml
```

The `nproc-per-node` argument should match the data, tensor and pipeline
Expand Down
16 changes: 8 additions & 8 deletions docs/source/saving-and-reading-results.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,30 +3,30 @@
## Saving results locally

Lighteval will automatically save results and evaluation details in the
directory set with the `--output_dir` argument. The results will be saved in
directory set with the `--output-dir` option. The results will be saved in
`{output_dir}/results/{model_name}/results_{timestamp}.json`. [Here is an
example of a result file](#example-of-a-result-file). The output path can be
any [fsspec](https://filesystem-spec.readthedocs.io/en/latest/index.html)
compliant path (local, s3, hf hub, gdrive, ftp, etc).

To save the details of the evaluation, you can use the `--save_details`
argument. The details will be saved in a parquet file
To save the details of the evaluation, you can use the `--save-details`
option. The details will be saved in a parquet file
`{output_dir}/details/{model_name}/{timestamp}/details_{task}_{timestamp}.parquet`.

## Pushing results to the HuggingFace hub

You can push the results and evaluation details to the HuggingFace hub. To do
so, you need to set the `--push_to_hub` as well as the `--results_org`
argument. The results will be saved in a dataset with the name at
so, you need to set the `--push-to-hub` as well as the `--results-org`
option. The results will be saved in a dataset with the name at
`{results_org}/{model_org}/{model_name}`. To push the details, you need to set
the `--save_details` argument.
the `--save-details` option.
The dataset created will be private by default, you can make it public by
setting the `--public_run` argument.
setting the `--public-run` option.


## Pushing results to Tensorboard

You can push the results to Tensorboard by setting `--push_to_tensorboard`.
You can push the results to Tensorboard by setting `--push-to-tensorboard`.


## How to load and investigate details
Expand Down
22 changes: 9 additions & 13 deletions docs/source/use-vllm-as-backend.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,9 @@ Lighteval allows you to use `vllm` as backend allowing great speedups.
To use, simply change the `model_args` to reflect the arguments you want to pass to vllm.

```bash
lighteval accelerate \
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--output_dir="./evals/"
lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16" \
"leaderboard|truthfulqa:mc|0|0"
```

`vllm` is able to distribute the model across multiple GPUs using data
Expand All @@ -17,19 +16,17 @@ You can choose the parallelism method by setting in the the `model_args`.
For example if you have 4 GPUs you can split it across using `tensor_parallelism`:

```bash
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval accelerate \
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--output_dir="./evals/"
export VLLM_WORKER_MULTIPROC_METHOD=spawn && lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,tensor_parallel_size=4" \
"leaderboard|truthfulqa:mc|0|0"
```

Or, if your model fits on a single GPU, you can use `data_parallelism` to speed up the evaluation:

```bash
lighteval accelerate \
--model_args="vllm,pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
--tasks "leaderboard|truthfulqa:mc|0|0" \
--output_dir="./evals/"
lighteval vllm \
"pretrained=HuggingFaceH4/zephyr-7b-beta,dtype=float16,data_parallel_size=4" \
"leaderboard|truthfulqa:mc|0|0"
```

Available arguments for `vllm` can be found in the `VLLMModelConfig`:
Expand All @@ -50,4 +47,3 @@ Available arguments for `vllm` can be found in the `VLLMModelConfig`:
> [!WARNING]
> In the case of OOM issues, you might need to reduce the context size of the
> model as well as reduce the `gpu_memory_utilisation` parameter.

1 change: 0 additions & 1 deletion examples/model_configs/base_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "base" # can be base, tgi, or endpoint
base_params:
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
dtype: "bfloat16"
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/endpoint_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "endpoint" # can be base, tgi, or endpoint
base_params:
endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
model: "meta-llama/Llama-2-7b-hf"
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/peft_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "base"
base_params:
model_args: "pretrained=predibase/customer_support,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ... For a PEFT model, the pretrained model should be the one trained with PEFT and the base model below will contain the original model on which the adapters will be applied.
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/quantized_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "base"
base_params:
model_args: "pretrained=HuggingFaceH4/zephyr-7b-beta,revision=main" # pretrained=model_name,trust_remote_code=boolean,revision=revision_to_use,model_parallel=True ...
dtype: "4bit" # Specifying the model to be loaded in 4 bit uses BitsAndBytesConfig. The other option is to use "8bit" quantization.
Expand Down
1 change: 0 additions & 1 deletion examples/model_configs/tgi_model.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
model:
type: "tgi" # can be base, tgi, or endpoint
instance:
inference_server_address: ""
inference_server_auth: null
Expand Down
4 changes: 1 addition & 3 deletions examples/nanotron/lighteval_config_override_template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@ generation: null
logging:
output_dir: "outputs"
save_details: false
push_results_to_hub: false
push_details_to_hub: false
push_results_to_tensorboard: false
push_to_hub: false
public_run: false
results_org: null
tensorboard_metric_prefix: "eval"
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ dependencies = [
"datasets>=2.14.0",
"numpy<2", # pinned to avoid incompatibilities
# Prettiness
"typer",
"termcolor==2.3.0",
"pytablewriter",
"colorama",
Expand Down Expand Up @@ -114,4 +115,4 @@ Issues = "https://github.com/huggingface/lighteval/issues"
# Changelog = "https://github.com/huggingface/lighteval/blob/master/CHANGELOG.md"

[project.scripts]
lighteval = "lighteval.__main__:cli_evaluate"
lighteval = "lighteval.__main__:app"
Loading