Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cropping signals or padding #5

Open
JuanFMontesinos opened this issue Mar 8, 2024 · 1 comment
Open

Cropping signals or padding #5

JuanFMontesinos opened this issue Mar 8, 2024 · 1 comment

Comments

@JuanFMontesinos
Copy link

Hi again, after long time :)

I had a quick question I hope you could help me with.
When simulating signals, the final length (duration) of the simulated signals is larger than the dry (source) signals. This makes sense as it's probably reverberation still bouncing around for a while.
At the time of training with batches, signals should be padded as all the acoustic signals should be the same length to be stacked.
It turns out I've been always training with batch size 1 and gradient accumulation and I've never noticed in depth this.

I was considering two options,

  1. the complex approach of padding and keeping track of the padded length to mask the loss and so on. This would also require padding the trajectory signals.
    a) Do you have any experience on how the model would react to padding?
    b) Padding with zeros feels pretty bad. Padding the trajctories with last seems ok, but not for the audio.
  2. Cropping the extra length so that they all are the same length.
    c) Is this bad in terms of modelling?

What is the best in your experience?

Thanks,
Juan

@DavidDiazGuerra
Copy link
Owner

Hi Juan,

Four sound source localization/tracking, I think cropping the signals is good enough, especially when training casual systems as icoDOA. Maybe this would be different in cases where it could lead to losing part of the desired information (as in speech recognition) or the reverberant tail is especially important (as in reverberation estimation).

In the case of casual systems, I'm pretty sure of this because the part of the signal you're cropping wouldn't affect the outputs you're obtaining. In the case of non-casual systems, I wouldn't expect to have a big impact but this is just an intuition since I haven't done any experiments to evaluate this.

Best,
David

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants