about model input #2

enhancer12 · 2020-12-05T09:03:37Z

Hello Tsun-An, I have read your paper "IMPROVING PERCEPTUAL QUALITY BY PHONE-FORTIFIED PERCEPTUAL LOSS FOR SPEECH ENHANCEMENT", which was actually well written. And I have a question about your code in 'dataset.py' L41&L42: why do you use a constant, 16384, to constrain the input length? Or does this constant have any special meaning? Thank you~

PhoneFortifiedPerceptualLoss/dataset.py

Line 41 in d763760

start = torch.randint(0, length - 16384 - 1, (1, ))

PhoneFortifiedPerceptualLoss/dataset.py

Line 42 in d763760

end = start + 16384

aleXiehta · 2021-01-29T05:13:48Z

Hi, we are grateful to know that you are interested in our work!
The input is truncated due to the limitation of VRAM because DCU-Net20 is quite large.
To generate an output with an identical length as its input, we need the input to be 2^n, and therefore we choose the length of 16384, which is about 1 second long.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about model input #2

about model input #2

enhancer12 commented Dec 5, 2020

aleXiehta commented Jan 29, 2021

about model input #2

about model input #2

Comments

enhancer12 commented Dec 5, 2020

aleXiehta commented Jan 29, 2021