Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix and document test_datasets #1228

Merged
merged 4 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,17 @@ datasets:
# For `completion` datsets only, uses the provided field instead of `text` column
field:

# A list of one or more datasets to eval the model with.
# You can use either test_datasets, or val_set_size, but not both.
test_datasets:
winglian marked this conversation as resolved.
Show resolved Hide resolved
- path: /workspace/data/eval.jsonl
ds_type: json
# You need to specify a split. For "json" datasets the default split is called "train".
split: train
type: completion
data_files:
- /workspace/data/eval.jsonl

# use RL training: dpo, ipo, kto_pair
rl:

Expand Down
3 changes: 2 additions & 1 deletion src/axolotl/core/trainer_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -735,7 +735,7 @@ def build(self, total_num_steps):
elif self.cfg.sample_packing and self.cfg.eval_sample_packing is False:
training_arguments_kwargs["dataloader_drop_last"] = True

if self.cfg.val_set_size == 0:
if not self.cfg.test_datasets and self.cfg.val_set_size == 0:
# no eval set, so don't eval
training_arguments_kwargs["evaluation_strategy"] = "no"
elif self.cfg.eval_steps:
Expand Down Expand Up @@ -822,6 +822,7 @@ def build(self, total_num_steps):
self.cfg.load_best_model_at_end is not False
or self.cfg.early_stopping_patience
)
and not self.cfg.test_datasets
and self.cfg.val_set_size > 0
and self.cfg.save_steps
and self.cfg.eval_steps
Expand Down
2 changes: 1 addition & 1 deletion src/axolotl/utils/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,7 +440,7 @@ def load_prepare_datasets(
split="train",
) -> Tuple[Dataset, Dataset, List[Prompter]]:
dataset, prompters = load_tokenized_prepared_datasets(
tokenizer, cfg, default_dataset_prepared_path
tokenizer, cfg, default_dataset_prepared_path, split=split
winglian marked this conversation as resolved.
Show resolved Hide resolved
)

if cfg.dataset_shard_num and cfg.dataset_shard_idx is not None:
Expand Down
Loading