wav output has no sound #1

kudzaijaure-dot · 2022-09-14T13:17:34Z

@souvikg544
Ive tried running this part of the script, but it raises a no argument error for these two:

  --model_path $test_ckpt \
  --config_path $test_config \

I manually copy the paths of the Best Model.pth and the config.json under tts_train_dir, and then it works with no error, but the output wav file has no speach, just a monotone buzzing sound.
Also tensorboard wouldn't launch so just skipped the step, could be related.

The text was updated successfully, but these errors were encountered:

souvikg544 · 2022-09-14T15:25:15Z

Thank you pulling up the issue . You have added the right path file. The problem is TTS speech generation from text requires at least 100000 epochs to get a suitable output .It also requires a big audio dataset. You can use Colab pro or AWS to achieve the results.

This is the same issue you are talking about ! Refer to the comments in the solution -

https://stackoverflow.com/questions/66307611/how-do-i-get-started-training-a-custom-voice-model-with-mozilla-tts-on-ubuntu-20

souvikg544 · 2022-09-14T15:27:48Z

Also anyone achieving any solution on colab do let me know the way around ...

kudzaijaure-dot · 2022-09-20T14:51:09Z

@souvikg544 Tried using over an hour of cleaned data, training took about 50 minutes but still out.wav has no speech, just a buzzing sound for a second. Every text extraction was successful, with 483 extracted 10 second bits. Tensorboard not launching so I skipped that stage. Audio Processor from TTS.Utils.Audio shows error first time trying to run command but runs normally with no changes the second time. Inferencing code below:
!tts --text "Text for TTS, to test how well the president of the united states speaks. Maybe what it requires is a verly long sentence that does the job"
--model_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/best_model.pth'
--config_path '/content/tts_train_dir/run-September-20-2022_01+46PM-3de0986/config.json'
--out_path out.wav

Model recorded 100 epochs in training on 1hr of data, so the suggested 100000 would require 1000 hours of audio?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wav output has no sound #1

wav output has no sound #1

kudzaijaure-dot commented Sep 14, 2022 •

edited

Loading

souvikg544 commented Sep 14, 2022

souvikg544 commented Sep 14, 2022

kudzaijaure-dot commented Sep 20, 2022 •

edited

Loading

wav output has no sound #1

wav output has no sound #1

Comments

kudzaijaure-dot commented Sep 14, 2022 • edited Loading

souvikg544 commented Sep 14, 2022

souvikg544 commented Sep 14, 2022

kudzaijaure-dot commented Sep 20, 2022 • edited Loading

kudzaijaure-dot commented Sep 14, 2022 •

edited

Loading

kudzaijaure-dot commented Sep 20, 2022 •

edited

Loading