-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor frame classification models to use single WindowedFramesDatapipe
#574
Comments
Started to make an issue but just changing this one: what we call |
Thinking about this more. The WindowedFrameClassification class implicitly assumes it's getting batches from a different DataSet during the validation step. There's a good reason for this; we want to compute a per-file metric like segment error rate to get an estimate of those metrics per-file, since this is what a user wants to know (among other things). If each of my files is one bout of vocalizations, one song for example, how well will I do per bout? However it also represents a kind of tight coupling between the model class and the dataset class. And in doing so it conflates the way we load the data with the concept of a "dataset" as discussed in #667; here is where a But for now we just need to clarify what a We can convert this class to use a memmap or in-memory array by representing sample number with an ID vector like we do now for the current WindowDataset class. There will be some vector |
Renaming / hijacking this issue to be about other classes for frame classification too. Some of this is needed for #630
|
After reading The thing that I was confused about the differences between the two dataset classes because But we can see that if self.shuffle:
# pts = np.random.randint(self.first_sample / self.stride, (self.last_sample - self.x_hist - 1) / self.stride, self.batch_size)
pts = np.random.choice(self.allowed_batches, size=self.batch_size, replace=False)
else:
pts = range(
int(self.first_sample / self.stride) + idx * self.batch_size,
int(self.first_sample / self.stride) + (idx + 1) * self.batch_size,
) (incidentally I think this implementation allows for returning the same window across multiple batches, i.e. repeats in the training set? Unless keras somehow tracks |
The other thing I get out of reading the We are very careful in the current There's also a couple drawbacks to respecting these boundaries:
|
WindowedFramesDatapipe
Renamed this issue (again?) After working with these datasets more I think I am understanding that:
So we can refactor to use a single |
I think
VocalDataset
can be rewritten to be more general, and a lot of the logic moved into transforms.This gives us more flexibility while also making the code more concise.
E.g., the following much simpler version of
VocalDataset
could be combined with the right transforms to give us what we have now and optionally work with other things, e.g. a model that uses audio as input. The transform should include loading audio, spectrogram files, etc. This would also make it easier to move to DataPipes should we decide to do so.The text was updated successfully, but these errors were encountered: