Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ICL kwargs in eval.py and long_context yamls #925

Merged
merged 56 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
340b79e
add yamls w/ old links
maxisawesome Oct 30, 2023
26dc067
load from max's public hf and parse hf datasets
maxisawesome Nov 1, 2023
31851a5
update rest of tasks
maxisawesome Nov 1, 2023
203be47
add better logging
maxisawesome Nov 1, 2023
33b6513
implemented leval tasks
maxisawesome Nov 1, 2023
089c392
move level
maxisawesome Nov 1, 2023
b644df1
add level yaml
maxisawesome Nov 1, 2023
5adf77e
add str parsing to hf
maxisawesome Nov 3, 2023
79810f3
wip
maxisawesome Nov 14, 2023
44a209a
llm-foundry working with new parser
maxisawesome Nov 14, 2023
657fa13
working w/ new parsing
maxisawesome Nov 14, 2023
2629f75
fix old long context tasks
maxisawesome Nov 14, 2023
c019ea1
wip
maxisawesome Nov 20, 2023
0608ea2
wip
maxisawesome Nov 20, 2023
cebb487
wip
maxisawesome Nov 28, 2023
fcbeba8
wip
maxisawesome Nov 28, 2023
56ae289
update to hf_parsing_map
maxisawesome Dec 5, 2023
4aee1ec
rm defaults
maxisawesome Dec 7, 2023
3440348
Merge branch 'main' into hf_parsing_with_icl_refactor
maxisawesome Dec 7, 2023
23ca0ba
fix parsing vars
maxisawesome Dec 7, 2023
c10698f
update defaults again
maxisawesome Dec 7, 2023
4e05385
rm merge conflict
maxisawesome Dec 7, 2023
6b7d13f
fix gen_kwargs
maxisawesome Jan 19, 2024
871bd9a
Merge branch 'mosaicml:main' into hf_parsing_with_icl_refactor
maxisawesome Jan 19, 2024
d9c6a28
rm old code path
maxisawesome Jan 19, 2024
eda47f2
Merge branch 'mosaicml:main' into hf_parsing_with_icl_refactor
maxisawesome Jan 26, 2024
d9b284c
fixups
maxisawesome Jan 27, 2024
393adfb
wip
maxisawesome Jan 27, 2024
9d12917
Merge branch 'hf_parsing_with_icl_refactor' of github.com:maxisawesom…
maxisawesome Jan 29, 2024
7b23a93
Merge branch 'main' into hf_parsing_with_icl_refactor
maxisawesome Jan 29, 2024
662af67
rm leval from pr
maxisawesome Jan 30, 2024
c9e0ef5
fix comments in yamls
maxisawesome Jan 30, 2024
09ffafd
add cot params
maxisawesome Jan 30, 2024
fb782db
add fewshot_random_seed
maxisawesome Jan 30, 2024
e735ae7
fix early_stopping_criteria, fewshot_num_seed default
maxisawesome Jan 30, 2024
35641df
undo rm hf_eval
maxisawesome Jan 30, 2024
f1282bc
add fewshot_random_seed to test
maxisawesome Jan 30, 2024
4a9a8b0
add 64k tasks
maxisawesome Feb 6, 2024
65ee617
add longer context, update composer versin
maxisawesome Feb 6, 2024
5ba5e30
address comments
maxisawesome Feb 6, 2024
b7884de
mixed
maxisawesome Feb 6, 2024
ff31e72
use seed by default
maxisawesome Feb 6, 2024
fca3d35
rm long_context_eval_8k.yaml
maxisawesome Feb 7, 2024
f1b65f7
add longer context evals
maxisawesome Feb 7, 2024
0b494bb
mv yamls
maxisawesome Feb 7, 2024
bd6048b
eval gauntlet wip
maxisawesome Feb 8, 2024
51c3ea8
update niah and wikiqa
maxisawesome Feb 8, 2024
3c1e344
Merge branch 'main' into hf_parsing_with_icl_refactor
maxisawesome Feb 8, 2024
6ce8cc6
Merge branch 'main' into hf_parsing_with_icl_refactor
dakinggg Feb 12, 2024
7849528
fix linting
maxisawesome Feb 12, 2024
0ffab21
Merge branch 'hf_parsing_with_icl_refactor' of github.com:maxisawesom…
maxisawesome Feb 12, 2024
124b60a
add default option
maxisawesome Feb 12, 2024
6b37a8c
change defaults
maxisawesome Feb 12, 2024
3f08d92
fix linting
maxisawesome Feb 12, 2024
cee8256
fix linting 2
maxisawesome Feb 12, 2024
c2810ef
Merge branch 'main' into hf_parsing_with_icl_refactor
maxisawesome Feb 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion llmfoundry/utils/builders.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ def build_evaluators(
tokenizer: PreTrainedTokenizerBase,
device_eval_batch_size: int,
icl_seq_len: int,
fewshot_random_seed: Optional[int],
icl_subset_num_batches: Optional[int],
) -> Tuple[List[Evaluator], List[str], Optional[EvalGauntlet]]:

Expand All @@ -72,6 +73,7 @@ def build_evaluators(
tokenizer,
device_eval_batch_size,
icl_seq_len,
fewshot_random_seed,
icl_subset_num_batches,
)
evaluators.extend(icl_evaluators)
Expand Down Expand Up @@ -129,13 +131,15 @@ def build_icl_data_and_gauntlet(
tokenizer: PreTrainedTokenizerBase,
device_eval_batch_size: int,
icl_seq_len: int,
fewshot_random_seed: Optional[int],
icl_subset_num_batches: Optional[int] = None
) -> Tuple[List[Evaluator], List[str], Optional[EvalGauntlet]]:
icl_evaluators, logger_keys = build_icl_evaluators(
icl_tasks_config,
tokenizer,
icl_seq_len,
device_eval_batch_size,
fewshot_random_seed=fewshot_random_seed,
icl_subset_num_batches=icl_subset_num_batches)
eval_gauntlet_cb = None
if eval_gauntlet_config is not None:
Expand Down Expand Up @@ -427,6 +431,7 @@ def build_icl_evaluators(
default_max_seq_len: int,
default_batch_size: int,
destination_dir: Optional[str] = None,
fewshot_random_seed: Optional[int] = None,
icl_subset_num_batches: Optional[int] = None,
) -> Tuple[List[Evaluator], List[str]]:
if destination_dir is None:
Expand Down Expand Up @@ -485,6 +490,7 @@ def _validate_cfg(icl_cfg: DictConfig):
if 'num_beams' not in icl_cfg:
icl_cfg.num_beams = 20


for icl_cfg in icl_tasks_list:
assert isinstance(icl_cfg, DictConfig)
_validate_cfg(icl_cfg)
Expand All @@ -502,6 +508,9 @@ def _validate_cfg(icl_cfg: DictConfig):
os.remove(destination_path)
dist.barrier()

hf_parsing_map = icl_cfg.get('hf_parsing_map', {})
hf_loading_vars = icl_cfg.get('hf_loading_vars', {})

dataloaders = get_icl_task_dataloader(
icl_cfg.icl_task_type,
icl_cfg.dataset_uri,
Expand All @@ -512,13 +521,20 @@ def _validate_cfg(icl_cfg: DictConfig):
num_fewshot=num_fewshot,
prompt_string=icl_cfg.prompt_string,
example_delimiter=icl_cfg.example_delimiter,
hf_loading_vars=hf_loading_vars,
hf_parsing_map=hf_parsing_map,
continuation_delimiter=icl_cfg.continuation_delimiter,
question_prelimiter=icl_cfg.get('question_prelimiter', ''),
destination_path=destination_path,
fewshot_random_seed=fewshot_random_seed,
pass_at_k=icl_cfg.pass_at_k,
generations_per_sample=icl_cfg.num_beams,
has_categories=icl_cfg.get('has_categories', False),
cot_delimiter=icl_cfg.get('cot_delimiter', ''))
cot_delimiter=icl_cfg.get('cot_delimiter', ''),
generation_kwargs=icl_cfg.get('generation_kwargs', {}),
early_stopping_criteria=icl_cfg.get('early_stopping_criteria'),
do_normalization=icl_cfg.get('do_normalization', True),
)
if hasattr(
icl_cfg,
'has_categories') and icl_cfg.has_categories and isinstance(
Expand Down
8 changes: 8 additions & 0 deletions scripts/eval/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ def evaluate_model(
python_log_level: Optional[str],
precision: str,
eval_gauntlet_df: Optional[pd.DataFrame],
fewshot_random_seed: Optional[int],
eval_subset_num_batches: int,
icl_subset_num_batches: Optional[int],
metadata: Optional[Dict[str, str]],
Expand All @@ -141,6 +142,7 @@ def evaluate_model(
tokenizer=tokenizer,
device_eval_batch_size=device_eval_batch_size,
icl_seq_len=max_seq_len,
fewshot_random_seed=fewshot_random_seed,
icl_subset_num_batches=icl_subset_num_batches,
)

Expand Down Expand Up @@ -301,6 +303,10 @@ def main(cfg: DictConfig) -> Tuple[List[Trainer], pd.DataFrame]:
'loggers',
must_exist=False,
default_value={})
fewshot_random_seed: int = pop_config(cfg,
maxisawesome marked this conversation as resolved.
Show resolved Hide resolved
'fewshot_random_seed',
must_exist=False,
default_value=1234)
eval_subset_num_batches: int = pop_config(cfg,
'eval_subset_num_batches',
must_exist=False,
Expand All @@ -318,6 +324,7 @@ def main(cfg: DictConfig) -> Tuple[List[Trainer], pd.DataFrame]:
'log_config',
must_exist=False,
default_value=True)


# Pop out interpolation variables.
pop_config(cfg, 'model_name_or_path', must_exist=False, default_value=None)
Expand Down Expand Up @@ -362,6 +369,7 @@ def main(cfg: DictConfig) -> Tuple[List[Trainer], pd.DataFrame]:
python_log_level=python_log_level,
precision=precision,
eval_gauntlet_df=eval_gauntlet_df,
fewshot_random_seed=fewshot_random_seed,
eval_subset_num_batches=eval_subset_num_batches,
icl_subset_num_batches=icl_subset_num_batches,
metadata=metadata,
Expand Down
74 changes: 74 additions & 0 deletions scripts/eval/yamls/eval_gauntlet_8k_length.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
eval_gauntlet:
weighting: EQUAL
subtract_random_baseline: true
rescale_accuracy: true
categories:
- name: 2k
benchmarks:
- name: hotpotqa_beginning_2k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_middle_2k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_end_2k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_beginning_2k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_middle_2k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_end_2k
num_fewshot: 0
random_baseline: 0
- name: wikiqa_2k
num_fewshot: 0
random_baseline: 0
- name: 4k
benchmarks:
- name: hotpotqa_beginning_4k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_middle_4k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_end_4k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_beginning_4k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_middle_4k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_end_4k
num_fewshot: 0
random_baseline: 0
- name: wikiqa_4k
num_fewshot: 0
random_baseline: 0
- name: 8k
benchmarks:
- name: hotpotqa_beginning_8k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_middle_8k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_end_8k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_beginning_8k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_middle_8k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_end_8k
num_fewshot: 0
random_baseline: 0
- name: wikiqa_8k
num_fewshot: 0
random_baseline: 0
76 changes: 76 additions & 0 deletions scripts/eval/yamls/eval_gauntlet_8k_section.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
eval_gauntlet:
weighting: EQUAL
subtract_random_baseline: true
rescale_accuracy: true
categories:
- name: beginning
benchmarks:
- name: hotpotqa_beginning_2k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_beginning_2k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_beginning_4k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_beginning_4k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_beginning_8k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_beginning_8k
num_fewshot: 0
random_baseline: 0
- name: middle
benchmarks:
- name: hotpotqa_middle_2k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_middle_2k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_middle_4k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_middle_4k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_middle_8k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_middle_8k
num_fewshot: 0
random_baseline: 0
- name: end
benchmarks:
- name: hotpotqa_end_2k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_end_2k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_end_4k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_end_4k
num_fewshot: 0
random_baseline: 0
- name: hotpotqa_end_8k
num_fewshot: 0
random_baseline: 0
- name: kv_pairs_end_8k
num_fewshot: 0
random_baseline: 0
- name: full
benchmarks:
- name: wikiqa_2k
num_fewshot: 0
random_baseline: 0
- name: wikiqa_4k
num_fewshot: 0
random_baseline: 0
- name: wikiqa_8k
num_fewshot: 0
random_baseline: 0
12 changes: 6 additions & 6 deletions scripts/eval/yamls/hf_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ models:
model:
name: hf_causal_lm
pretrained_model_name_or_path: ${model_name_or_path}
init_device: mixed
init_device: cpu
maxisawesome marked this conversation as resolved.
Show resolved Hide resolved
pretrained: true
tokenizer:
name: ${model_name_or_path}
Expand All @@ -37,11 +37,11 @@ models:
device_eval_batch_size: 4

# FSDP config for model sharding
# fsdp_config:
# sharding_strategy: FULL_SHARD
# mixed_precision: FULL
# forward_prefetch: True
# limit_all_gathers: True
fsdp_config:
sharding_strategy: FULL_SHARD
mixed_precision: FULL
forward_prefetch: True
limit_all_gathers: True

icl_tasks: 'eval/yamls/tasks_v0.1.yaml'
eval_gauntlet: 'eval/yamls/eval_gauntlet_v0.1.yaml'
27 changes: 27 additions & 0 deletions scripts/eval/yamls/long_context_eval_8k.yaml
maxisawesome marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
max_seq_len: 8196
seed: 1
precision: amp_bf16

models:
-
model_name: EleutherAI/gpt-neo-125m
model:
name: hf_causal_lm
pretrained_model_name_or_path: EleutherAI/gpt-neo-125m
init_device: cpu
maxisawesome marked this conversation as resolved.
Show resolved Hide resolved
pretrained: true
tokenizer:
name: EleutherAI/gpt-neo-125m
kwargs:
model_max_length: ${max_seq_len}

device_eval_batch_size: 1
icl_subset_num_batches: 2

# FSDP config for model sharding
fsdp_config:
sharding_strategy: FULL_SHARD
mixed_precision: FULL

icl_tasks: 'eval/yamls/long_context_tasks.yaml'
eval_gauntlet: 'eval/yamls/eval_gauntlet_8k_section.yaml'
Loading
Loading