Questions about usage for a new comer in the field #180

altineller · 2024-11-28T16:47:53Z

altineller
Nov 28, 2024

Hello,

I would like to use coqui-ai-TTS, to narrate robot videos I make. I have gone tru documentation, and successfully synthesized and also cloned voices. I have been running tests on cloning the voice of Q from James Bond, explaining to students how robots work. And so far it is going good, but there are few problems with usage I would like to ask about.

I am making an array, and then running tts with

tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 --text ""${sentences[$i]}"" --vocoder_name vocoder_models/en/vctk/hifigan_v2 --speaker_wav $SPEAKERS --language_idx en --use_cuda --out_p
ath desmond-$formatted_index.wav

Here is a bash array full of sentences

declare -a sentences
sentences[0]=" Updating your cards firmware."
sentences[1]=".To enter bootloader mode. press and hold the reset button, then the boot button."
sentences[2]="Release the reset button and then release the boot button."
sentences[3]="The green LED will indicate bootloader mode."
sentences[4]="Bootloader mode can be verified by viewing system logs."
sentences[5]="Issue the following command:"
sentences[6]="Press space couple of times."
sentences[7]="And plug in the card to USB port."
sentences[8]="You should see the phrase device firmware update on the logs."
sentences[9]="Updating firmware with DFU."
sentences[10]="Attach suffix and prefix to the binary file with given commands."
sentences[11]="The values you enter to the suffix are related to usb devices address on the host system."
sentences[12]="By entering suffix and prefix information on the binary file you are instructing the bootloader on how to run and boot the firmware."
sentences[13]="Run the following command to initiate the firmware update process."

The following array has 3 members, and each element contains multiple sentences from the array above:

declare -a sentences
sentences[0]="Updating your cards firmware. To enter bootloader mode. press and hold the reset button, then the boot button. Release the reset button and then release the boot button. The green LED will in
dicate bootloader mode."
sentences[1]="Bootloader mode can be verified by viewing system logs. Issue the following command. Press space couple of times. And plug in the card to USB port. You should see the phrase device firmware u
pdate on the logs. Updating firmware with DFU. Attach suffix and prefix to the binary file with given commands."
sentences[2]="The values you enter to the suffix are related to usb devices address on the host system. By entering suffix and prefix information on the binary file you are instructing the bootloader on ho
w to run and boot the firmware. Run the following command to initiate the firmware update process."

Joining more sentences together yields better results. Why is that? Also is there a markup to sentences? Or should it be clear english?
For example, space before the beginning of the sentence alters sound. Or a period. Are there any tricks as to modify the sound with some sort of markers, so when it generates less than perfect speech, one can edit it a bit with markup, to correct it?

Also, are there any documentation that I can read (except the documentation linked from idiap/coqui-ai-TTS) about tuning? Since I don't exactly understand the parameters, I can not do any tuning, except blindy.

Another question is about voice cloning. I have extracted 11 segments, denoised them, and used them as --speaker_wav 01.wav 02.wav ... 11.wav - Then I wrote a script to generate the same speech, but removing 1 speaker_wav each time. Needless to say, some outputs are better than others. So if I have a sense of which input audio is good for sampling, and which audio is not, I could do much better.

Thanks for developing this software and sharing it opensource. It is monumental amount of work, and despite few points, it is perfect.

Best Regards,
Can Altineller

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about usage for a new comer in the field #180

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Questions about usage for a new comer in the field #180

altineller Nov 28, 2024

Replies: 0 comments

altineller
Nov 28, 2024