-
Notifications
You must be signed in to change notification settings - Fork 538
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds support for chat formatted finetuning input data. (#884)
* fix conflicting formatting linting guidelines * used older union operator for legacy support * did the same thing in another place * isort ignore specific lines * fixes * isort do not skip line * address comments * renamed some more things * split tests and add some verification for tokenization split * fix formatting * added docstrings * added end-to-end-test with HF dataset * fix code style * renamed file and fixed tests * use chat template diff * addressed comment * Update llmfoundry/data/finetuning/tasks.py Co-authored-by: Daniel King <[email protected]> * Update llmfoundry/data/finetuning/tasks.py Co-authored-by: Daniel King <[email protected]> * fixed type of TokenizedExample * use cast * use _ALLOWED_{PROMPT, RESPONSE}_KEYS * updated tests * fix * fix? * Update llmfoundry/data/finetuning/tasks.py Co-authored-by: Daniel King <[email protected]> * Update llmfoundry/data/finetuning/tasks.py Co-authored-by: Daniel King <[email protected]> --------- Co-authored-by: Daniel King <[email protected]>
- Loading branch information
Showing
3 changed files
with
307 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.