Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Now only uses functions for prompt definition #213

Merged
merged 24 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
58fee65
add function prompt assigment
hynky1999 Jun 20, 2024
57d40f9
add json casting
hynky1999 Jun 20, 2024
cb8be21
fix ruff setting + fmt
hynky1999 Jun 20, 2024
f275f60
Merge branch 'main' into function_prompts
clefourrier Jul 3, 2024
cde6c04
Merge branch 'main' into function_prompts
clefourrier Jul 4, 2024
ced2945
replaced json tasks by python tasks, step 1
clefourrier Jul 4, 2024
82e9815
wip
clefourrier Jul 4, 2024
a6aa133
Merge branch 'main' into simplify_task_system
clefourrier Jul 4, 2024
c5f428c
simplification part 1
clefourrier Jul 4, 2024
4410c23
fix extended tasks + typo
clefourrier Jul 4, 2024
c93a2fa
fix
clefourrier Jul 4, 2024
9676756
fix nanotron example
clefourrier Jul 4, 2024
b84f006
small fix
clefourrier Jul 4, 2024
c656d64
Merge branch 'main' into function_prompts
clefourrier Jul 5, 2024
770f67e
Merge branch 'simplify_task_system' into hynek_function
clefourrier Jul 5, 2024
d43ffac
now use function, not string, to pass prompts in examples
clefourrier Jul 5, 2024
e10a84c
moved everyone to function calling
clefourrier Jul 5, 2024
c927d14
LightevalTask now only takes functions
clefourrier Jul 5, 2024
e4182b4
removed templated type which messed up the test suite
clefourrier Jul 5, 2024
9f518ad
last fix + doc udpate
clefourrier Jul 5, 2024
1a0efde
Update src/lighteval/tasks/registry.py
clefourrier Jul 9, 2024
da016b7
Merge branch 'main' into simplify_task_system
clefourrier Jul 9, 2024
8f98337
Merge branch 'simplify_task_system' into hynek_function
clefourrier Jul 9, 2024
7d2afa4
Merge branch 'main' into hynek_function
clefourrier Jul 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ repos:

- repo: https://github.com/charliermarsh/ruff-pre-commit
# Ruff version.
rev: 'v0.1.6'
rev: 'v0.2.2'
hooks:
- id: ruff
args: ['--fix']
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ accelerate launch --multi_gpu --num_processes=<num_gpus> run_evals_accelerate.py
--output_dir output_dir
```

You can find the template of the expected model configuration in [examples/model_configs/base_model.yaml_](./examples/model_configs/base_model.yaml).
You can find the template of the expected model configuration in [examples/model_configs/base_model.yaml_](./examples/model_configs/base_model.yaml).

### Evaluating a large model with pipeline parallelism

Expand Down Expand Up @@ -197,7 +197,7 @@ There are two types of configuration files that can be provided for running on t

1. [endpoint_model.yaml](./examples/model_configs/endpoint_model.yaml): This configuration allows you to launch the model using [HuggingFace's Inference Endpoints](https://huggingface.co/inference-endpoints/dedicated). You can specify in the configuration file all the relevant parameters, and then `lighteval` will automatically deploy the endpoint, run the evaluation, and finally delete the endpoint (unless you specify an endpoint that was already launched, in which case the endpoint won't be deleted afterwards).

2. [tgi_model.yaml](./examples/model_configs/tgi_model.yaml): This configuration lets you specify the URL of a model running in a TGI container, such as one deployed on HuggingFace's serverless inference.
2. [tgi_model.yaml](./examples/model_configs/tgi_model.yaml): This configuration lets you specify the URL of a model running in a TGI container, such as one deployed on HuggingFace's serverless inference.

Templates for these configurations can be found in [examples/model_configs](./examples/model_configs/).

Expand Down Expand Up @@ -266,7 +266,7 @@ However, we are very grateful to the Harness and HELM teams for their continued
- [logging](https://github.com/huggingface/lighteval/tree/main/src/lighteval/logging): Our loggers, to display experiment information and push it to the hub after a run
- [metrics](https://github.com/huggingface/lighteval/tree/main/src/lighteval/metrics): All the available metrics you can use. They are described in metrics, and divided between sample metrics (applied at the sample level, such as prediction accuracy) and corpus metrics (applied over the whole corpus). You'll also find available normalisation functions.
- [models](https://github.com/huggingface/lighteval/tree/main/src/lighteval/models): Possible models to use. We cover transformers (base_model), with adapter or delta weights, as well as TGI models locally deployed (it's likely the code here is out of date though), and brrr/nanotron models.
- [tasks](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks): Available tasks. The complete list is in `tasks_table.jsonl`, and you'll find all the prompts in `tasks_prompt_formatting.py`. Popular tasks requiring custom logic are exceptionally added in the [extended tasks](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended).
- [tasks](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks): Available tasks. The complete list is in `default_tasks.py`, and you'll find all the prompts in `tasks_prompt_formatting.py`. Popular tasks requiring custom logic are exceptionally added in the [extended tasks](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended).
- [examples/tasks](https://github.com/huggingface/lighteval/tree/main/examples/tasks) contains a list of available tasks you can launch. We advise using tasks in the `recommended_set`, as it's possible that some of the other tasks need double checking.
- [tests](https://github.com/huggingface/lighteval/tree/main/tests) contains our test suite, which we run at each PR to prevent regressions in metrics/prompts/tasks, for a subset of important tasks.

Expand All @@ -285,10 +285,10 @@ A popular community evaluation can move to become an extended or core evaluation
#### Core evaluations
Prompt function: **find a suitable prompt function** in `src.lighteval.tasks.task_prompt_formatting.py`, or code your own. This function must output a `Doc` object, which should contain the `query`, your prompt, and either `gold`, the gold output, or `choices` and `gold_index`, the list of choices and index or indices of correct answers. If your query contains an instruction that should not be repeated in a few shot setup, add it to an `instruction` field.

Summary: create a **line summary** of your evaluation, in `src/lighteval/tasks/tasks_table.jsonl`. This summary should contain the following fields:
Summary: create a `LightevalTaskConfig` summary of your evaluation, in `src/lighteval/tasks/default_tasks.py`. This summary should contain the following fields:
- `name` (str), your evaluation name
- `suite` (list), the suite(s) to which your evaluation should belong. This field allows us to compare different task implementations and is used as a task selection to differentiate the versions to launch. At the moment, you'll find the keywords ["helm", "bigbench", "original", "lighteval", "community", "custom"]; for core evals, please choose `lighteval`.
- `prompt_function` (str), the name of the prompt function you defined in the step above
- `prompt_function` (Callable), the prompt function you defined in the step above
- `hf_repo` (str), the path to your evaluation dataset on the hub
- `hf_subset` (str), the specific subset you want to use for your evaluation (note: when the dataset has no subset, fill this field with `"default"`, not with `None` or `""`)
- `hf_avail_splits` (list), all the splits available for your dataset (train, valid or validation, test, other...)
Expand All @@ -310,7 +310,7 @@ Summary: create a **line summary** of your evaluation, in `src/lighteval/tasks/t
Make sure you can launch your model with your new task using `--tasks lighteval|yournewtask|2|0`.

#### Community evaluations
Copy the `community_tasks/_template.yml` to `community_tasks/yourevalname.py` and edit it to add your custom tasks (the parameters you can use are explained above). It contains an interesting mechanism if the dataset you are adding contains a lot of subsets.
Copy the `community_tasks/_template.py` to `community_tasks/yourevalname.py` and edit it to add your custom tasks (the parameters you can use are explained above). It contains an interesting mechanism if the dataset you are adding contains a lot of subsets.

Make sure you can launch your model with your new task using `--tasks community|yournewtask|2|0 --custom_tasks community_tasks/yourevalname.py`.

Expand Down
42 changes: 20 additions & 22 deletions community_tasks/_template.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,28 @@
from lighteval.tasks.tasks_prompt_formatting import LETTER_INDICES


# DEFINE YOUR PROMPT FUNCTIONS
# Define as many as you need for your different tasks
def prompt_fn(line, task_name: str = None):
"""Defines how to go from a dataset line to a doc object.
Follow examples in src/lighteval/tasks/tasks_prompt_formatting.py, or get more info
about what this function should do in the README.
"""
return Doc(
task_name=task_name,
query="",
choices="",
gold_index=0,
instruction="",
)


# EVAL WITH NO SUBSET ##
# This is how you create a simple task (like hellaswag) which has one single subset
# attached to it, and one evaluation possible.
task = LightevalTaskConfig(
name="myothertask",
prompt_function="prompt_fn", # must be defined in the file or imported from src/lighteval/tasks/tasks_prompt_formatting.py
prompt_function=prompt_fn, # must be defined in the file or imported from src/lighteval/tasks/tasks_prompt_formatting.py
suite=["community"],
hf_repo="",
hf_subset="default",
Expand Down Expand Up @@ -73,7 +89,7 @@ def __init__(
super().__init__(
name=name,
hf_subset=hf_subset,
prompt_function="prompt_fn", # must be defined in the file
prompt_function=prompt_fn, # must be defined in the file or imported from src/lighteval/tasks/tasks_prompt_formatting.py
hf_repo="",
metric=[""],
hf_avail_splits=[],
Expand All @@ -88,25 +104,9 @@ def __init__(
)


# DEFINE YOUR PROMPT FUNCTIONS
# Define as many as you need for your different tasks
def prompt_fn(line, task_name: str = None):
"""Defines how to go from a dataset line to a doc object.
Follow examples in src/lighteval/tasks/tasks_prompt_formatting.py, or get more info
about what this function should do in the README.
"""
return Doc(
task_name=task_name,
query="",
choices="",
gold_index=0,
instruction="",
)


# STORE YOUR EVALS
SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
_TASKS = SUBSET_TASKS + [task]
TASKS_TABLE = SUBSET_TASKS + [task]


# CUSTOM METRIC IF NEEDED
Expand All @@ -124,8 +124,6 @@ def prompt_fn(line, task_name: str = None):
# MODULE LOGIC
# You should not need to touch this
# Convert to dict for lighteval
TASKS_TABLE = [task.as_dict() for task in _TASKS]

if __name__ == "__main__":
print(t["name"] for t in TASKS_TABLE)
print(t.name for t in TASKS_TABLE)
print(len(TASKS_TABLE))
27 changes: 12 additions & 15 deletions community_tasks/aimo_evals.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,18 @@
from lighteval.tasks.requests import Doc


def aimo_prompt(line, task_name: str = None):
return Doc(
task_name=task_name,
choices=[str(line["answer"])],
gold_index=0,
query=line["problem"],
)


task = LightevalTaskConfig(
name="aimo_progress_prize_1",
prompt_function="aimo_prompt",
prompt_function=aimo_prompt,
suite=["community"],
hf_subset="",
hf_repo="lighteval/aimo_progress_prize_1",
Expand All @@ -44,25 +53,13 @@
stop_sequence=None,
)


def aimo_prompt(line, task_name: str = None):
return Doc(
task_name=task_name,
choices=[str(line["answer"])],
gold_index=0,
query=line["problem"],
)


# STORE YOUR EVALS
_TASKS = [task]
TASKS_TABLE = [task]


# MODULE LOGIC
# You should not need to touch this
# Convert to dict for lighteval
TASKS_TABLE = [task.as_dict() for task in _TASKS]

if __name__ == "__main__":
print(t["name"] for t in TASKS_TABLE)
print(t.name for t in TASKS_TABLE)
print(len(TASKS_TABLE))
Loading