Refactor in_context_learning_evaluation.py (#2713) · ShashankMosaicML/composer@d0f0be9

Commit

Refactor in_context_learning_evaluation.py (mosaicml#2713)

* extremely wip commit w/ ICLdataset class

* more extremely broken wip

* add split keys

* first pass at moving QA to new format

* linting

* linting

* tests pass!

* fix repeated defaults, gold_idx --> gold

* basic HF parsing but test not passing

* fix cot. wip

* del device and world_size from tests

* change to .map

* fix schema

* tests passing w/ collate refactor

* finish HF tests

* add hf batch parsing

* linting

* add doc strings, rm hf_parsing_vars

* revert question_prelimiter back to prelimiter

* fix tests

* add more docstrings

* add doc strings, fix hf w/ categories

* add doc strings and default check

* linting

* add temperature

* remove need for hf:// on hf links

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* fix comments, add test for check hf uri, still wip

* add gpu tests back

* update fix_eos_on_preamble

* update comments

* add return types

* typing, comments

* init RAG Generation task

* init _construct_context for RAG eval

* fix context key, move hf test dataset, few docstrings

* fix docstrings, add second path for schema

* init collate_fn,  _tokenize_example functions (bug exists)

* fix typo in warning error

* remove canonical_solution from batch

* missed one canonical_sllution

* remove encoded dataset to have just one dataset var

* rename sample to example

* improve comment

* edit RAGtask

* rm hf parsing func

* fix docstring, rename fewshot fun

* docstring

* change default split_batch to check types

* remove need to set split_keys

* doc string update

* improve comments

* rm stacked_keys for tokenize_labels bool

* initial wip in comments

* make _conv_tokens_to_tensors func

* wip - sketch out batch_mappings

* linting and debugging statements to help me remember where I'm doing wip

* all tests except one sus schema test passing

* fix missing fewshot for schema

* rm temperature add generation_kwargs

* add defaults that are currently set in llm-foundry builders.py

* fix defaults in tests, add some comments

* tests wip

* tests for new funcs

* rm RAG task

* more docstring

* tests passing

* wip

* wip

* add dict to data_spec

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

Co-authored-by: Daniel King <[email protected]>

* Apply suggestions from code review

comment improvements

Co-authored-by: Daniel King <[email protected]>

* default_batch to base_batch and some docstrings

* update comments and fix test. move spacing to default get_answer

* improved docstrings

* finish schema/mc tests

* address pr review comments

* lintign

* fixing import, add type

* update comments

* update keys

* add typechecks for token ids

* rm outdated test

* fix tests

* add microbatch test

* pyright fixes

* linting attempts

* linting wip

* fix linting

* add early stopping and do_normalization documentation

* fix linting

* fix linting

* fix final dist test issue

* fix isort

* fix linting

* fix docstrings

* fix docstrings

* add warning filters

* fix warnings

* Update composer/datasets/in_context_learning_evaluation.py

fix spelling

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

fix spelling

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

fix spelling

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

fix spelling

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

fix spelling

Co-authored-by: Daniel King <[email protected]>

* Update composer/datasets/in_context_learning_evaluation.py

fix spelling

Co-authored-by: Daniel King <[email protected]>

* add capitalization

* revert default changes

* change update_generate_kwargs to public

* fix type

* move pad_tok_id error

---------

Co-authored-by: Daniel King <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: Eitan Turok <[email protected]>

Loading branch information

4 people authored and ShashankMosaicML committed Feb 3, 2024

1 parent b8df31c commit d0f0be9

composer/datasets/__init__.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -11,6 +11,11 @@ @@
                                          build_streaming_cifar10_dataloader, build_synthetic_cifar10_dataloader)
     from composer.datasets.imagenet import (build_ffcv_imagenet_dataloader, build_imagenet_dataloader,
                                             build_streaming_imagenet1k_dataloader, build_synthetic_imagenet_dataloader)
+    from composer.datasets.in_context_learning_evaluation import (InContextLearningCodeEvalDataset,
+                                                                  InContextLearningDataset, InContextLearningLMTaskDataset,
+                                                                  InContextLearningMultipleChoiceTaskDataset,
+                                                                  InContextLearningQATaskDataset,
+                                                                  InContextLearningSchemaTaskDataset)
     from composer.datasets.lm_dataset import build_lm_dataloader
     from composer.datasets.mnist import build_mnist_dataloader, build_synthetic_mnist_dataloader
     from composer.datasets.synthetic import (SyntheticBatchPairDataset, SyntheticDataLabelType, SyntheticDataType,
@@ Expand All / @@ -24,6 +29,12 @@ @@
         'SyntheticDataLabelType',
         'SyntheticDataType',
         'SyntheticPILDataset',
+        'InContextLearningDataset',
+        'InContextLearningQATaskDataset',
+        'InContextLearningLMTaskDataset',
+        'InContextLearningCodeEvalDataset',
+        'InContextLearningMultipleChoiceTaskDataset',
+        'InContextLearningSchemaTaskDataset',
         'build_ade20k_dataloader',
         'build_streaming_ade20k_dataloader',
         'build_streaming_c4_dataloader',
@@ Expand Down @@

0 comments on commit `d0f0be9`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `d0f0be9`

Commit

There are no files selected for viewing

0 comments on commit d0f0be9

0 comments on commit `d0f0be9`