You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should make it possible to specify different splits for the same dataset.
This avoids the need to re-"prep" a dataset every time; a dataset will just be a set of files in a folder--without sub-directories for "train"/"val"/"test"--and the splits will be in a separate file in that directory.
Dataset/datapipe classes should accept a splits_path argument, that will default to None.
If the splits_path argument is None, then the datapipe class looks in a default location for a single splits path (and raises a FileNotFoundError if it's not found).
The splits_path wil be distinct from what we now call dataset_csv_path. It will be a json file, basically metadata, that declares not only what we now call dataset_csv_path but also any other paths needed for a split. In the case of a frame classification dataset, this includes the vectors of sample IDs and indices within each sample.
Probably we should rename dataset_csv_path to something like inputs_targets_paths_csv for clarity.
So we'll need to:
add splits_path to dataset classes
modify how prep.frame_classification works to not make split sub-directories
The text was updated successfully, but these errors were encountered:
Related to #748
We should make it possible to specify different splits for the same dataset.
This avoids the need to re-"prep" a dataset every time; a dataset will just be a set of files in a folder--without sub-directories for "train"/"val"/"test"--and the splits will be in a separate file in that directory.
Dataset/datapipe classes should accept a
splits_path
argument, that will default to None.If the
splits_path
argument is None, then the datapipe class looks in a default location for a single splits path (and raises a FileNotFoundError if it's not found).The
splits_path
wil be distinct from what we now calldataset_csv_path
. It will be a json file, basically metadata, that declares not only what we now calldataset_csv_path
but also any other paths needed for a split. In the case of a frame classification dataset, this includes the vectors of sample IDs and indices within each sample.Probably we should rename
dataset_csv_path
to something likeinputs_targets_paths_csv
for clarity.So we'll need to:
splits_path
to dataset classesprep.frame_classification
works to not make split sub-directoriesThe text was updated successfully, but these errors were encountered: