ENH: Make it possible to specify different splits for datasets #749

NickleDave · 2024-05-04T13:51:00Z

Related to #748

We should make it possible to specify different splits for the same dataset.
This avoids the need to re-"prep" a dataset every time; a dataset will just be a set of files in a folder--without sub-directories for "train"/"val"/"test"--and the splits will be in a separate file in that directory.

Dataset/datapipe classes should accept a splits_path argument, that will default to None.
If the splits_path argument is None, then the datapipe class looks in a default location for a single splits path (and raises a FileNotFoundError if it's not found).

The splits_path wil be distinct from what we now call dataset_csv_path. It will be a json file, basically metadata, that declares not only what we now call dataset_csv_path but also any other paths needed for a split. In the case of a frame classification dataset, this includes the vectors of sample IDs and indices within each sample.

Probably we should rename dataset_csv_path to something like inputs_targets_paths_csv for clarity.

So we'll need to:

add splits_path to dataset classes
modify how prep.frame_classification works to not make split sub-directories

The text was updated successfully, but these errors were encountered:

This was referenced May 4, 2024

DEV: Version 1.0 to-do list #614

Open

ENH: Add dataset sub-table to config file, remove other dataset/transform param keys #748

Closed

NickleDave added the ENH: enhancement enhancement; new feature or request label Oct 25, 2024

NickleDave self-assigned this Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Make it possible to specify different splits for datasets #749

ENH: Make it possible to specify different splits for datasets #749

NickleDave commented May 4, 2024

ENH: Make it possible to specify different splits for datasets #749

ENH: Make it possible to specify different splits for datasets #749

Comments

NickleDave commented May 4, 2024