Speed up prediction - batch_size & num_workers #731
-
Hi there, Just wanted to double check with you: the best way to speed up the prediction. I am working with one-hour long audio files --> I would like to use a model I trained on these files, predict the presence of a species, 5s duration clips. Using CPUs at prediction time.
Playing around with these three parameters.. it seems that the fastest way was to: loop across a list of files, one file at a time, with a list of audio file paths as input, something like:
I doubt that's the best way to do it? What would you recommend me to do? Like... splitting the one-hour long file before calling Many thanks in advance, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
@sammlapp: Wondering if you may be able to help us here? Many thanks in advance |
Beta Was this translation helpful? Give feedback.
-
Hey @charleygros so prediction on long files with OpenSoundscape >=0.8.0 is actually very simple (even simpler than my message above). All you need to do is pass a list of files (or dataframe with file paths in the index), and the CNN.predict() method will take care of splitting up your files into the appropriate length clips. The CNN object's For example, with OpenSoundscape 0.9.0:
In general, increasing batch_size and num_workers will speed up prediction up to some threshold where you don't have enough memory, CPUs, or I/O speed. In particular, num_workers will be limited by CPUs and I/O speed, while bach_size will be limited by memory. |
Beta Was this translation helpful? Give feedback.
Hey @charleygros so prediction on long files with OpenSoundscape >=0.8.0 is actually very simple (even simpler than my message above). All you need to do is pass a list of files (or dataframe with file paths in the index), and the CNN.predict() method will take care of splitting up your files into the appropriate length clips. The CNN object's
.preprocessor
attribute will use the same clip duration that was used to train the model. You can use thepredict
method'snum_workers
argument to parallelize preprocessing of samples over parallel CPU processes, andbatch_size
to increase training speed by preparing and running many samples at once.For example, with OpenSoundscape 0.9.0: