-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is resampling needed when using EnCodec? #218
Comments
@m-pana oh hey Michele! thanks and no problem you caught me at the right time, as I am about to return to some audio work do you want to see if this commit solves your issue? |
import torch
from audiolm_pytorch import EncodecWrapper
encodec = EncodecWrapper()
x = torch.randn(1, 48000)
out = encodec(x, input_sample_hz = 16000) |
Awesome, thanks for the feedback. |
@m-pana i think the scenario i have not covered yet is the prompt waveforms on inference - the input sample hz should be specified and it should auto convert for the semantic and acoustic models maybe we should leave this issue open so i resolve it this week |
Oh, I had completely missed that!
Alright, thanks! |
Regarding encodec, can it process batched input of wav files? Where do we mask the wav files? |
Hi,
Thanks for this amazing repository.
I'm starting to play around with the models and I would like to use EnCodec to save me the trouble of training SoundStream (obviously).
I was wondering, how is the sample rate mismatch between EnCodec and HuBERT handled? I see from fairseq that HuBERT base was trained on LibriSpeech, which is 16k.
Given that, do I need to use 24k audio or 16k audio? Does the code have some automatic internal resampling that takes care of the mismatch between the two models? I've looked for it, but couldn't find it.
Thanks a lot!
The text was updated successfully, but these errors were encountered: