Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gauntlet v0.3: Fix chain-of-thought tasks #824

Merged
merged 66 commits into from
Feb 6, 2024
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
dbf5535
Skip flaky lion8b test (#598)
dblalock Sep 15, 2023
315afb5
add eval output logging
bmosaicml Sep 19, 2023
215b802
add back tasks
bmosaicml Sep 19, 2023
ebef847
foo
bmosaicml Sep 19, 2023
467ac3a
add rlhf prompts
bmosaicml Sep 19, 2023
ef91472
add rlhf prompts
bmosaicml Sep 19, 2023
c1db48c
add rlhf prompts
bmosaicml Sep 19, 2023
ff63cfd
add rlhf prompts
bmosaicml Sep 19, 2023
0dc30b0
add rlhf prompts
bmosaicml Sep 20, 2023
6d93ba6
fix prompt
bmosaicml Sep 20, 2023
5254833
fix prompt
bmosaicml Sep 20, 2023
af32824
modify mcli
bmosaicml Nov 15, 2023
29c297a
Merge branch 'main' into output_eval_logging
bmosaicml Nov 15, 2023
91c6c71
test
bmosaicml Nov 27, 2023
b28fd6e
test
bmosaicml Nov 27, 2023
1e6e923
fix
bmosaicml Nov 27, 2023
0ff6598
added math dataset
bmosaicml Nov 27, 2023
335e087
edit yaml
bmosaicml Nov 28, 2023
b028545
Merge branch 'main' into output_eval_logging
bmosaicml Dec 15, 2023
e787bfe
prep gsm8k identically to eleuther
bmosaicml Dec 18, 2023
8831f32
prep gsm8k identically to eleuther
bmosaicml Dec 18, 2023
704cbc4
Merge branch 'output_eval_logging' into debug_gsm8k
bmosaicml Dec 19, 2023
b862d94
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Dec 19, 2023
5a27497
add early stopping criteria
bmosaicml Dec 20, 2023
0895057
finish
bmosaicml Dec 20, 2023
cc57015
debug
bmosaicml Dec 20, 2023
2476f39
fix
bmosaicml Dec 23, 2023
7ccc52a
bug
bmosaicml Dec 23, 2023
43c25aa
remove eval output logging callback
bmosaicml Dec 27, 2023
2a83bdd
modify other cot tasks
bmosaicml Dec 28, 2023
fd9daf0
restore
bmosaicml Dec 28, 2023
2f43c9f
fix
bmosaicml Dec 28, 2023
1fdd0d4
fix
bmosaicml Dec 28, 2023
6673694
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Jan 18, 2024
45314e7
fix composer verion
bmosaicml Jan 18, 2024
a4dee8a
Merge branch 'main' into debug_gsm8k
bmosaicml Jan 19, 2024
e39adbc
gauntlet v0.2.1
bmosaicml Jan 19, 2024
baaad4d
gauntlet v0.2.1
bmosaicml Jan 19, 2024
8b1c6d3
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Jan 24, 2024
50c89b9
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Jan 24, 2024
e53356b
prep
bmosaicml Jan 24, 2024
fe64eb9
prep
bmosaicml Jan 24, 2024
6c046cf
prep
bmosaicml Jan 24, 2024
556d645
Merge branch 'main' into debug_gsm8k
bmosaicml Jan 24, 2024
76afa80
foo
bmosaicml Jan 25, 2024
c96bbef
Merge branch 'main' into debug_gsm8k
bmosaicml Jan 25, 2024
3f0bc04
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Jan 25, 2024
a9e8610
restore
bmosaicml Jan 25, 2024
2f0638b
restore
bmosaicml Jan 25, 2024
53a66a8
finish merhe
bmosaicml Jan 29, 2024
db3986b
Merge branch 'main' into debug_gsm8k
bmosaicml Jan 31, 2024
9d4adca
restore mcli
bmosaicml Feb 1, 2024
bfe0cfd
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Feb 1, 2024
6276cf2
merge
bmosaicml Feb 2, 2024
9e31ec0
fix precommit
bmosaicml Feb 2, 2024
59f1de0
fix
bmosaicml Feb 2, 2024
c6f2109
Update hf_eval.yaml
bmosaicml Feb 2, 2024
40acdb4
fix
bmosaicml Feb 2, 2024
2d681aa
Merge branch 'main' into debug_gsm8k
bmosaicml Feb 2, 2024
d8ff7d4
fix
bmosaicml Feb 2, 2024
d651630
Merge branch 'main' into debug_gsm8k
dakinggg Feb 3, 2024
4f7240f
Merge branch 'main' into debug_gsm8k
bmosaicml Feb 5, 2024
ea355cb
remove programming
bmosaicml Feb 5, 2024
b8eac04
update readme
bmosaicml Feb 6, 2024
718ebe3
Merge branch 'debug_gsm8k' of github.com:mosaicml/llm-foundry into de…
bmosaicml Feb 6, 2024
7f7855d
Merge branch 'main' into debug_gsm8k
bmosaicml Feb 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions llmfoundry/utils/builders.py
Original file line number Diff line number Diff line change
Expand Up @@ -501,7 +501,11 @@ def _validate_cfg(icl_cfg: DictConfig):
if dist.get_local_rank() == 0 and os.path.exists(destination_path):
os.remove(destination_path)
dist.barrier()

early_stopping_criteria = icl_cfg.get('early_stopping_criteria',
None)
early_stopping_criteria = list(
bmosaicml marked this conversation as resolved.
Show resolved Hide resolved
early_stopping_criteria
) if early_stopping_criteria is not None else None
dataloaders = get_icl_task_dataloader(
icl_cfg.icl_task_type,
icl_cfg.dataset_uri,
Expand All @@ -518,7 +522,9 @@ def _validate_cfg(icl_cfg: DictConfig):
pass_at_k=icl_cfg.pass_at_k,
generations_per_sample=icl_cfg.num_beams,
has_categories=icl_cfg.get('has_categories', False),
cot_delimiter=icl_cfg.get('cot_delimiter', ''))
cot_delimiter=icl_cfg.get('cot_delimiter', ''),
early_stopping_criteria=early_stopping_criteria,
do_normalization=icl_cfg.get('do_normalization', True))
if hasattr(
icl_cfg,
'has_categories') and icl_cfg.has_categories and isinstance(
Expand Down
4 changes: 2 additions & 2 deletions mcli/mcli-hf-eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,5 +50,5 @@ parameters:
limit_all_gathers: True


icl_tasks: 'eval/yamls/tasks_v0.2.yaml'
eval_gauntlet: 'eval/yamls/eval_gauntlet_v0.2.yaml'
icl_tasks: 'eval/yamls/tasks_v0.3.yaml'
eval_gauntlet: 'eval/yamls/eval_gauntlet_v0.3.yaml'
490 changes: 245 additions & 245 deletions scripts/eval/local_data/symbolic_problem_solving/aqua.jsonl

Large diffs are not rendered by default.

1,319 changes: 1,319 additions & 0 deletions scripts/eval/local_data/symbolic_problem_solving/gsm8k_prepended_8shot.jsonl

Large diffs are not rendered by default.

5,000 changes: 5,000 additions & 0 deletions scripts/eval/local_data/symbolic_problem_solving/math.jsonl

Large diffs are not rendered by default.

1,904 changes: 1,904 additions & 0 deletions scripts/eval/local_data/symbolic_problem_solving/math_complex_soln.jsonl

Large diffs are not rendered by default.

3,096 changes: 3,096 additions & 0 deletions scripts/eval/local_data/symbolic_problem_solving/math_simple_soln.jsonl

Large diffs are not rendered by default.

144 changes: 144 additions & 0 deletions scripts/eval/yamls/eval_gauntlet_v0.3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
eval_gauntlet:
weighting: EQUAL
subtract_random_baseline: true
rescale_accuracy: true
averages:
core_average:
- world_knowledge
- commonsense_reasoning
- language_understanding
- symbolic_problem_solving
- reading_comprehension
categories:
- name: world_knowledge
benchmarks:
- name: jeopardy
num_fewshot: 3
random_baseline: 0
- name: bigbench_qa_wikidata
num_fewshot: 3
random_baseline: 0
- name: arc_easy
num_fewshot: 3
random_baseline: 0.25
- name: arc_challenge
num_fewshot: 3
random_baseline: 0.25
- name: mmlu
num_fewshot: 5
random_baseline: 0.25
- name: triviaqa_sm_sub
num_fewshot: 3
random_baseline: 0.0
- name: commonsense_reasoning
benchmarks:
- name: copa
num_fewshot: 0
random_baseline: 0.5
- name: siqa
num_fewshot: 3
random_baseline: 0.5
- name: commonsense_qa
num_fewshot: 0
random_baseline: 0.25
- name: piqa
num_fewshot: 0
random_baseline: 0.5
- name: openbook_qa
num_fewshot: 10
random_baseline: 0.25
- name: bigbench_strange_stories
num_fewshot: 0
random_baseline: 0.5
- name: bigbench_strategy_qa
num_fewshot: 0
random_baseline: 0.5
- name: language_understanding
benchmarks:
- name: lambada_openai
num_fewshot: 0
random_baseline: 0.0
- name: hellaswag
num_fewshot: 0
random_baseline: 0.25
- name: winograd
num_fewshot: 3
random_baseline: 0.5
- name: winogrande
num_fewshot: 5
random_baseline: 0.5
- name: symbolic_problem_solving
benchmarks:
- name: bigbench_elementary_math_qa
num_fewshot: 1
random_baseline: 0.25
- name: bigbench_dyck_languages
num_fewshot: 5
random_baseline: 0
- name: bigbench_operators
num_fewshot: 3
random_baseline: 0.0
- name: simple_arithmetic_withspaces
num_fewshot: 5
random_baseline: 0.0
- name: simple_arithmetic_nospaces
num_fewshot: 5
random_baseline: 0.0
- name: aqua
num_fewshot: 3
random_baseline: 0.0
- name: gsm8k
num_fewshot: 0
random_baseline: 0.0
- name: svamp
num_fewshot: 5
random_baseline: 0
- name: agi_eval_sat_math
num_fewshot: 3
random_baseline: 0.0
- name: agi_eval_lsat_ar
num_fewshot: 5
random_baseline: 0.25
- name: math
num_fewshot: 4
random_baseline: 0.0
- name: math_simple
num_fewshot: 4
random_baseline: 0.0
- name: reading_comprehension
benchmarks:
- name: squad
num_fewshot: 3
random_baseline: 0
- name: boolq
num_fewshot: 0
random_baseline: 0.5
- name: coqa
num_fewshot: 0
random_baseline: 0.0
- name: agi_eval_lsat_rc
num_fewshot: 5
random_baseline: 0.25
- name: agi_eval_lsat_lr
num_fewshot: 5
random_baseline: 0.25
- name: agi_eval_sat_en
num_fewshot: 5
random_baseline: 0.25
- name: programming
benchmarks:
- name: human_eval
num_fewshot: 0
random_baseline: 0
- name: human_eval_cpp
num_fewshot: 0
random_baseline: 0
- name: human_eval_js
num_fewshot: 0
random_baseline: 0
- name: human_eval_return_simple
num_fewshot: 0
random_baseline: 0
- name: human_eval_25
num_fewshot: 0
random_baseline: 0
4 changes: 2 additions & 2 deletions scripts/eval/yamls/hf_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,5 +43,5 @@ device_eval_batch_size: 4
# forward_prefetch: True
# limit_all_gathers: True

icl_tasks: 'eval/yamls/tasks_v0.1.yaml'
eval_gauntlet: 'eval/yamls/eval_gauntlet_v0.1.yaml'
icl_tasks: 'eval/yamls/tasks_v0.3.yaml'
eval_gauntlet: 'eval/yamls/eval_gauntlet_v0.3.yaml'
Loading
Loading