Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

garbled audio #161

Closed
toninog opened this issue Nov 2, 2024 · 8 comments
Closed

garbled audio #161

toninog opened this issue Nov 2, 2024 · 8 comments

Comments

@toninog
Copy link

toninog commented Nov 2, 2024

Hi

Not sure why, but when I run either of the example code - I get garbled audio - when I use the Mini Model I get the "correct" audio

I have the following setup

Ubuntu 22.04
Python 3.11.4

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

python3 -m pip show torch
Name: torch
Version: 2.5.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3-Clause
Location: /data/parler-tts/.venv/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvjitlink-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: descript-audio-codec, descript-audiotools, julius, parler_tts, torch-stoi, torchaudio

Thanks

@toninog toninog changed the title gabled audio garbled audio Nov 2, 2024
@SummerXIATIAN
Copy link

same here, any solutions?

@wizche
Copy link

wizche commented Nov 4, 2024

same here, mini model works, large model generates just noise.
Debian 11 / python 3.9.2, Cuda 12.6, torch 2.5.1

@DerXter
Copy link

DerXter commented Nov 15, 2024

same here too, but with the mini model, my audio_arr vector get always a length of 5632. When the sample rate is then applied, the audio duration drops to 0. Could someone help me solve this issue please !?

@FireMMDC
Copy link

Looking at this thread #157 it seems like maybe the change of the transformers version might have broken the larger model's inferencing, using 5d0aca9 I was able to generate audio using the larger model.

@sswam
Copy link

sswam commented Nov 24, 2024

This issue seems important, the large model just does not work. I suggest to use a test suite, and run the output through whisper or something.

@toninog
Copy link
Author

toninog commented Nov 25, 2024

yeah - I use that - but the amount of failures is a LOT. almost 95% fail or generate garbled audio

So - there must be an underlying issue... just not sure where as followed the instructions

@ylacombe
Copy link
Collaborator

Hey everyone, thanks for opening the issue and helping on identifying where it blocks.

It actually comes from the audio encoder. While I still have to figure out how to actually fix it while being backward compatible for person using it with the previous repo version, you can find fixed weights for the current version by using:
model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-large-v1", revision= "refs/pr/9")

@toninog
Copy link
Author

toninog commented Nov 29, 2024

Amazing @ylacombe - this fixed things perfectly!

@toninog toninog closed this as completed Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants