Skip to content

Commit

Permalink
docs: add note to note that file extension is required in training da…
Browse files Browse the repository at this point in the history
…ta path (#447)

* docs: add note to note that file extension is required in training data path

Signed-off-by: Will Johnson <[email protected]>

* docs: clarify what is being checked

Signed-off-by: Will Johnson <[email protected]>

* docs: make note, clarification

Signed-off-by: Will Johnson <[email protected]>

* docs: clarify specifically what code does

Signed-off-by: Will Johnson <[email protected]>

---------

Signed-off-by: Will Johnson <[email protected]>
  • Loading branch information
willmj authored Jan 23, 2025
1 parent c0362ad commit f22e243
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,11 @@ ARROW | ✅

As iterated above, we also support passing a HF dataset ID directly via `--training_data_path` argument.

**NOTE**: Due to the variety of supported data formats and file types, `--training_data_path` is handled as follows:
- If `--training_data_path` ends in a valid file extension (e.g., .json, .csv), it is treated as a file.
- If `--training_data_path` points to a valid folder, it is treated as a folder.
- If neither of these are true, the data preprocessor tries to load `--training_data_path` as a Hugging Face (HF) dataset ID.

## Use cases supported with `training_data_path` argument

### 1. Data formats with a single sequence and a specified response_template to use for masking on completion.
Expand Down

0 comments on commit f22e243

Please sign in to comment.