Skip to content

Commit

Permalink
Fix bug in dataset loading (#284)
Browse files Browse the repository at this point in the history
* Fix bug in dataset loading

This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download`

* Check type of data_files, and load accordingly
  • Loading branch information
emmatyping authored Sep 27, 2023
1 parent d1236f2 commit 8fe0e63
Showing 1 changed file with 20 additions and 5 deletions.
25 changes: 20 additions & 5 deletions src/axolotl/utils/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,11 +205,26 @@ def for_d_in_datasets(dataset_configs):
use_auth_token=use_auth_token,
)
else:
fp = hf_hub_download(
repo_id=d.path,
repo_type="dataset",
filename=d.data_files,
)
if isinstance(d.data_files, str):
fp = hf_hub_download(
repo_id=d.path,
repo_type="dataset",
filename=d.data_files,
)
elif isinstance(d.data_files, list):
fp = []
for file in d.data_files:
fp.append(
hf_hub_download(
repo_id=d.path,
repo_type="dataset",
filename=file,
)
)
else:
raise ValueError(
"data_files must be either a string or list of strings"
)
ds = load_dataset(
"json", name=d.name, data_files=fp, streaming=False, split=None
)
Expand Down

0 comments on commit 8fe0e63

Please sign in to comment.