Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make it possible to specify different splits for datasets #749

Open
2 tasks
NickleDave opened this issue May 4, 2024 · 0 comments
Open
2 tasks

ENH: Make it possible to specify different splits for datasets #749

NickleDave opened this issue May 4, 2024 · 0 comments
Assignees
Labels
ENH: enhancement enhancement; new feature or request

Comments

@NickleDave
Copy link
Collaborator

Related to #748

We should make it possible to specify different splits for the same dataset.
This avoids the need to re-"prep" a dataset every time; a dataset will just be a set of files in a folder--without sub-directories for "train"/"val"/"test"--and the splits will be in a separate file in that directory.

Dataset/datapipe classes should accept a splits_path argument, that will default to None.
If the splits_path argument is None, then the datapipe class looks in a default location for a single splits path (and raises a FileNotFoundError if it's not found).

The splits_path wil be distinct from what we now call dataset_csv_path. It will be a json file, basically metadata, that declares not only what we now call dataset_csv_path but also any other paths needed for a split. In the case of a frame classification dataset, this includes the vectors of sample IDs and indices within each sample.

Probably we should rename dataset_csv_path to something like inputs_targets_paths_csv for clarity.

So we'll need to:

  • add splits_path to dataset classes
  • modify how prep.frame_classification works to not make split sub-directories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ENH: enhancement enhancement; new feature or request
Projects
None yet
Development

No branches or pull requests

1 participant