Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add finetuning streaming dataset conversion #933

Merged
merged 16 commits into from
Feb 6, 2024
Prev Previous commit
Next Next commit
Update llmfoundry/data/finetuning/tasks.py
Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com>
  • Loading branch information
bigning and dakinggg authored Feb 6, 2024
commit 0d72f7615b6976d80910cee4d4212b2840248516
2 changes: 1 addition & 1 deletion llmfoundry/data/finetuning/tasks.py
Original file line number Diff line number Diff line change
@@ -234,7 +234,7 @@ def is_valid_ift_example(pad_token_id: int, max_seq_len: int,
"""Check if the example is a valid ift example.

This functions does the following check:
a. Length of input_ids should less than max_seq_len
a. Length of input_ids should be less than max_seq_len
b. Both input_ids and labels should not be empty
c. Labels should have at least 1 non-padding token.

Loading