Is resampling needed when using EnCodec? #218

m-pana · 2023-07-31T14:46:46Z

Hi,
Thanks for this amazing repository.

I'm starting to play around with the models and I would like to use EnCodec to save me the trouble of training SoundStream (obviously).
I was wondering, how is the sample rate mismatch between EnCodec and HuBERT handled? I see from fairseq that HuBERT base was trained on LibriSpeech, which is 16k.

Given that, do I need to use 24k audio or 16k audio? Does the code have some automatic internal resampling that takes care of the mismatch between the two models? I've looked for it, but couldn't find it.

Thanks a lot!

lucidrains · 2023-08-01T16:58:03Z

@m-pana oh hey Michele! thanks and no problem

you caught me at the right time, as I am about to return to some audio work

do you want to see if this commit solves your issue?

lucidrains · 2023-08-01T16:58:40Z

import torch
from audiolm_pytorch import EncodecWrapper

encodec = EncodecWrapper()

x = torch.randn(1, 48000)
out = encodec(x, input_sample_hz = 16000)

m-pana · 2023-08-01T17:30:29Z

Awesome, thanks for the feedback.
So, if I understood correctly, I should now be able to train both the coarse and the fine transformers on 16k data, provided that I add input_sample_hz=16000 whenever a forward call to the codec occurs, right?
And I believe that should happen only within the wrapper classes of the two transformers, if I'm not mistaken - I've only found here and here for the coarse one, and here and here for the fine one. Please correct me if I'm wrong.

lucidrains · 2023-08-01T17:39:46Z

@m-pana ohh, actually, if you are training, the dataset should automatically figure out what the sampling frequencies it needs, and internally the proper sampled audio is forwarded to the correct module

lucidrains · 2023-08-01T17:43:18Z

@m-pana i think the scenario i have not covered yet is the prompt waveforms on inference - the input sample hz should be specified and it should auto convert for the semantic and acoustic models

maybe we should leave this issue open so i resolve it this week

m-pana · 2023-08-01T18:07:54Z

@m-pana ohh, actually, if you are training, the dataset should automatically figure out what the sampling frequencies it needs, and internally the proper sampled audio is forwarded to the correct module

Oh, I had completely missed that!

@m-pana i think the scenario i have not covered yet is the prompt waveforms on inference - the input sample hz should be specified and it should auto convert for the semantic and acoustic models

maybe we should leave this issue open so i resolve it this week

Alright, thanks!

p0p4k · 2023-11-16T14:47:20Z

Regarding encodec, can it process batched input of wav files? Where do we mask the wav files?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is resampling needed when using EnCodec? #218

Is resampling needed when using EnCodec? #218

m-pana commented Jul 31, 2023

lucidrains commented Aug 1, 2023

lucidrains commented Aug 1, 2023

m-pana commented Aug 1, 2023

lucidrains commented Aug 1, 2023

lucidrains commented Aug 1, 2023 •

edited

Loading

m-pana commented Aug 1, 2023

p0p4k commented Nov 16, 2023

Is resampling needed when using EnCodec? #218

Is resampling needed when using EnCodec? #218

Comments

m-pana commented Jul 31, 2023

lucidrains commented Aug 1, 2023

lucidrains commented Aug 1, 2023

m-pana commented Aug 1, 2023

lucidrains commented Aug 1, 2023

lucidrains commented Aug 1, 2023 • edited Loading

m-pana commented Aug 1, 2023

p0p4k commented Nov 16, 2023

lucidrains commented Aug 1, 2023 •

edited

Loading