garbled audio #161

toninog · 2024-11-02T16:28:16Z

Hi

Not sure why, but when I run either of the example code - I get garbled audio - when I use the Mini Model I get the "correct" audio

I have the following setup

Ubuntu 22.04
Python 3.11.4

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

python3 -m pip show torch
Name: torch
Version: 2.5.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /data/parler-tts/.venv/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: descript-audio-codec, descript-audiotools, julius, parler_tts, torch-stoi, torchaudio

Thanks

SummerXIATIAN · 2024-11-04T10:56:54Z

same here, any solutions?

wizche · 2024-11-04T14:15:29Z

same here, mini model works, large model generates just noise.
Debian 11 / python 3.9.2, Cuda 12.6, torch 2.5.1

DerXter · 2024-11-15T23:44:46Z

same here too, but with the mini model, my audio_arr vector get always a length of 5632. When the sample rate is then applied, the audio duration drops to 0. Could someone help me solve this issue please !?

FireMMDC · 2024-11-23T18:37:38Z

Looking at this thread #157 it seems like maybe the change of the transformers version might have broken the larger model's inferencing, using 5d0aca9 I was able to generate audio using the larger model.

sswam · 2024-11-24T21:05:11Z

This issue seems important, the large model just does not work. I suggest to use a test suite, and run the output through whisper or something.

toninog · 2024-11-25T11:27:58Z

yeah - I use that - but the amount of failures is a LOT. almost 95% fail or generate garbled audio

So - there must be an underlying issue... just not sure where as followed the instructions

ylacombe · 2024-11-29T10:32:31Z

Hey everyone, thanks for opening the issue and helping on identifying where it blocks.

It actually comes from the audio encoder. While I still have to figure out how to actually fix it while being backward compatible for person using it with the previous repo version, you can find fixed weights for the current version by using:
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-large-v1", revision= "refs/pr/9")

toninog · 2024-11-29T11:05:20Z

Amazing @ylacombe - this fixed things perfectly!

toninog changed the title ~~gabled audio~~ garbled audio Nov 2, 2024

toninog closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

garbled audio #161

garbled audio #161

toninog commented Nov 2, 2024 •

edited

Loading

SummerXIATIAN commented Nov 4, 2024

wizche commented Nov 4, 2024

DerXter commented Nov 15, 2024

FireMMDC commented Nov 23, 2024

sswam commented Nov 24, 2024

toninog commented Nov 25, 2024

ylacombe commented Nov 29, 2024

toninog commented Nov 29, 2024

garbled audio #161

garbled audio #161

Comments

toninog commented Nov 2, 2024 • edited Loading

SummerXIATIAN commented Nov 4, 2024

wizche commented Nov 4, 2024

DerXter commented Nov 15, 2024

FireMMDC commented Nov 23, 2024

sswam commented Nov 24, 2024

toninog commented Nov 25, 2024

ylacombe commented Nov 29, 2024

toninog commented Nov 29, 2024

toninog commented Nov 2, 2024 •

edited

Loading