Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor in_context_learning_evaluation.py (mosaicml#2713)
* extremely wip commit w/ ICLdataset class * more extremely broken wip * add split keys * first pass at moving QA to new format * linting * linting * tests pass! * fix repeated defaults, gold_idx --> gold * basic HF parsing but test not passing * fix cot. wip * del device and world_size from tests * change to .map * fix schema * tests passing w/ collate refactor * finish HF tests * add hf batch parsing * linting * add doc strings, rm hf_parsing_vars * revert question_prelimiter back to prelimiter * fix tests * add more docstrings * add doc strings, fix hf w/ categories * add doc strings and default check * linting * add temperature * remove need for hf:// on hf links * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * fix comments, add test for check hf uri, still wip * add gpu tests back * update fix_eos_on_preamble * update comments * add return types * typing, comments * init RAG Generation task * init _construct_context for RAG eval * fix context key, move hf test dataset, few docstrings * fix docstrings, add second path for schema * init collate_fn, _tokenize_example functions (bug exists) * fix typo in warning error * remove canonical_solution from batch * missed one canonical_sllution * remove encoded dataset to have just one dataset var * rename sample to example * improve comment * edit RAGtask * rm hf parsing func * fix docstring, rename fewshot fun * docstring * change default split_batch to check types * remove need to set split_keys * doc string update * improve comments * rm stacked_keys for tokenize_labels bool * initial wip in comments * make _conv_tokens_to_tensors func * wip - sketch out batch_mappings * linting and debugging statements to help me remember where I'm doing wip * all tests except one sus schema test passing * fix missing fewshot for schema * rm temperature add generation_kwargs * add defaults that are currently set in llm-foundry builders.py * fix defaults in tests, add some comments * tests wip * tests for new funcs * rm RAG task * more docstring * tests passing * wip * wip * add dict to data_spec * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py Co-authored-by: Daniel King <[email protected]> * Apply suggestions from code review comment improvements Co-authored-by: Daniel King <[email protected]> * default_batch to base_batch and some docstrings * update comments and fix test. move spacing to default get_answer * improved docstrings * finish schema/mc tests * address pr review comments * lintign * fixing import, add type * update comments * update keys * add typechecks for token ids * rm outdated test * fix tests * add microbatch test * pyright fixes * linting attempts * linting wip * fix linting * add early stopping and do_normalization documentation * fix linting * fix linting * fix final dist test issue * fix isort * fix linting * fix docstrings * fix docstrings * add warning filters * fix warnings * Update composer/datasets/in_context_learning_evaluation.py fix spelling Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py fix spelling Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py fix spelling Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py fix spelling Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py fix spelling Co-authored-by: Daniel King <[email protected]> * Update composer/datasets/in_context_learning_evaluation.py fix spelling Co-authored-by: Daniel King <[email protected]> * add capitalization * revert default changes * change update_generate_kwargs to public * fix type * move pad_tok_id error --------- Co-authored-by: Daniel King <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: Eitan Turok <[email protected]>
- Loading branch information