Trying to fientune for brazilian portuguese language #184

abhisirka2001 · 2024-10-20T10:33:48Z

abhisirka2001
Oct 20, 2024

I'm fine-tuning the F5 base model for Brazilian Portuguese using 160 hours of audio data with custom tokenizer. After 100 epochs on a single RTX 4090, I've encountered the following issues:

When using an English audio prompt and reference text, the output captures the accent but doesn't produce the expected words. You can listen to the output here.

When using a Brazilian Portuguese audio prompt and reference text, the model generates an empty audio file.

What should I consider or adjust to improve the model's performance from here? Any advice on resolving these issues would be appreciated!

LucianoDaluz · 2024-10-20T21:23:32Z

LucianoDaluz
Oct 20, 2024

Please let us know if you get it.

3 replies

jeanfredson Nov 12, 2024

SilentAntagonist/F5ttsptbr

LucianoDaluz Nov 12, 2024

404 page :(

dougrhis Nov 12, 2024

404 page

grebsu · 2024-10-25T00:10:59Z

grebsu
Oct 25, 2024

eu espero que você consiga!!!!

13 replies

LucianoDaluz Oct 29, 2024

It is 100% better. It just seems to insert some commas (some pauses) where it shouldnt. But now we can perfectly understand the words.
The only word it says wrong is "Contry = País", that there was a slight lack of intonation in the letter "i"

LRhaunter Oct 29, 2024

As a Brazilian, I’d say the voice sounds very real and natural, but it’s still a bit confusing. Even as a native speaker, it can be a little hard to understand. Some words are accurate but lack the accent.

Hey I trained the model on more datasets and here is one sample. please let me know if words and accent is better now. I hope model is good in learning new accent and new language. https://drive.google.com/file/d/1Adz4EXNa0yC-ZBs_-V3u_ndb-Zv582FV/view?usp=sharing

Thanks

Now it’s much better! The accent sounds really good, but it still needs a few tweaks. There was a small error in one of the words where a letter was missing.

LucianoDaluz Nov 1, 2024

As a Brazilian, I’d say the voice sounds very real and natural, but it’s still a bit confusing. Even as a native speaker, it can be a little hard to understand. Some words are accurate but lack the accent.

Hey I trained the model on more datasets and here is one sample. please let me know if words and accent is better now. I hope model is good in learning new accent and new language. https://drive.google.com/file/d/1Adz4EXNa0yC-ZBs_-V3u_ndb-Zv582FV/view?usp=sharing

Thanks

Hey friend! Any update on this one? If you have any new file to check just let me know. All the best.

dougrhis Nov 7, 2024

@abhisirka2001 You can shared this model?

Falkker Nov 19, 2024

As a Brazilian, I’d say the voice sounds very real and natural, but it’s still a bit confusing. Even as a native speaker, it can be a little hard to understand. Some words are accurate but lack the accent.

Hey I trained the model on more datasets and here is one sample. please let me know if words and accent is better now. I hope model is good in learning new accent and new language. https://drive.google.com/file/d/1Adz4EXNa0yC-ZBs_-V3u_ndb-Zv582FV/view?usp=sharing
Thanks

Hey friend! Any update on this one? If you have any new file to check just let me know. All the best.

Conseguiu treinar em pt-br amigo?

x4080 · 2024-10-29T21:29:43Z

x4080
Oct 29, 2024

What are the steps to train another language and how much ram needed ?

2 replies

abhisirka2001 Oct 30, 2024
Author

essential things are preparation of dataset and vocab file for your language and adjusting the model hyperparameters.
i am not sure about the the minimum vram requirement. you can use the gradio finetuning instructions given in the repo.

x4080 Oct 30, 2024

@abhisirka2001 thanks

abhisirka2001 · 2024-11-19T04:53:45Z

abhisirka2001
Nov 19, 2024
Author

https://huggingface.co/ModelsLab/F5-tts-brazilian/tree/main

…

On Tue, Nov 19, 2024, 5:44 AM Falkker ***@***.***> wrote: As a Brazilian, I’d say the voice sounds very real and natural, but it’s still a bit confusing. Even as a native speaker, it can be a little hard to understand. Some words are accurate but lack the accent. Hey I trained the model on more datasets and here is one sample. please let me know if words and accent is better now. I hope model is good in learning new accent and new language. https://drive.google.com/file/d/1Adz4EXNa0yC-ZBs_-V3u_ndb-Zv582FV/view?usp=sharing Thanks Hey friend! Any update on this one? If you have any new file to check just let me know. All the best. Conseguiu treinar em pt-br amigo? — Reply to this email directly, view it on GitHub <#184 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVBPZ3HPUQZJGK5BTN6NL2D2BJ7FRAVCNFSM6AAAAABQIMKNBSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRZHAZTINA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

5 replies

traderpedroso Nov 19, 2024

https://huggingface.co/ModelsLab/F5-tts-brazilian/tree/main
…
On Tue, Nov 19, 2024, 5:44 AM Falkker @.> wrote: As a Brazilian, I’d say the voice sounds very real and natural, but it’s still a bit confusing. Even as a native speaker, it can be a little hard to understand. Some words are accurate but lack the accent. Hey I trained the model on more datasets and here is one sample. please let me know if words and accent is better now. I hope model is good in learning new accent and new language. https://drive.google.com/file/d/1Adz4EXNa0yC-ZBs_-V3u_ndb-Zv582FV/view?usp=sharing Thanks Hey friend! Any update on this one? If you have any new file to check just let me know. All the best. Conseguiu treinar em pt-br amigo? — Reply to this email directly, view it on GitHub <#184 (reply in thread)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVBPZ3HPUQZJGK5BTN6NL2D2BJ7FRAVCNFSM6AAAAABQIMKNBSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRZHAZTINA . You are receiving this because you were mentioned.Message ID: @.>

I used this as a starting point and have been training for 2 days with over 300 hours, but I still haven't achieved a good result. Besides that, the voice cloning is still terrible. I’m going to add another 1k hours for training on 8 H100s, and we'll see if it can clone with quality. If I manage to make it work, I will share the model, but if it only works with samples that contain private data, sharing won’t be possible. However, I hope it works. Personally, I’ve done fine-tuning on about 3 models, but I wasn't satisfied with any of them. I will share the best samples I have, but remember that you cannot use the samples without the owner's permission, so they are for demonstration purposes only. Some reference audio samples generate with quality and few errors, while the cloning is still terrible. Keep in mind that when generating audio, the reference must always be in the same language as the target output. Even so, up to this point, it fails to produce correct pronunciation for unseen speakers.

LONG AUDIO GEN

SHORT AUDIO GEN

I will leave some crucial points for generating quality audio. First, always use samples that are less than 10 seconds long; otherwise, it will add the excess and create a mix with the generated text and the reference text. Second, always ensure that the reference text matches the audio sample precisely, as Whisper sometimes transcribes incorrectly, so be attentive. Third, to ensure the best quality, use NFE Step 64.

sendwebpush Nov 19, 2024

Olá @traderpedroso , tbm estou trabalhando para a criação de um modelo, porem esta muito longe de ser perfeito, estou tentando treinar em um MacBook onde que tem 38c de GPU porem esta muito lento, tentei otimizar o código para aceitar os GPU do Macbook porem não prestou de nada, tem alguma sugestão de como acelerar o treinamento? não sei se poder, mas poderia me compartilhar seu contato para falarmos sobre o assunto?

LucianoDaluz Nov 19, 2024

https://huggingface.co/ModelsLab/F5-tts-brazilian/tree/main
…
On Tue, Nov 19, 2024, 5:44 AM Falkker @.> wrote: As a Brazilian, I’d say the voice sounds very real and natural, but it’s still a bit confusing. Even as a native speaker, it can be a little hard to understand. Some words are accurate but lack the accent. Hey I trained the model on more datasets and here is one sample. please let me know if words and accent is better now. I hope model is good in learning new accent and new language. https://drive.google.com/file/d/1Adz4EXNa0yC-ZBs_-V3u_ndb-Zv582FV/view?usp=sharing Thanks Hey friend! Any update on this one? If you have any new file to check just let me know. All the best. Conseguiu treinar em pt-br amigo? — Reply to this email directly, view it on GitHub <#184 (reply in thread)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVBPZ3HPUQZJGK5BTN6NL2D2BJ7FRAVCNFSM6AAAAABQIMKNBSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMRZHAZTINA . You are receiving this because you were mentioned.Message ID: _@**.**_>

I used this as a starting point and have been training for 2 days with over 300 hours, but I still haven't achieved a good result. Besides that, the voice cloning is still terrible. I’m going to add another 1k hours for training on 8 H100s, and we'll see if it can clone with quality. If I manage to make it work, I will share the model, but if it only works with samples that contain private data, sharing won’t be possible. However, I hope it works. Personally, I’ve done fine-tuning on about 3 models, but I wasn't satisfied with any of them. I will share the best samples I have, but remember that you cannot use the samples without the owner's permission, so they are for demonstration purposes only. Some reference audio samples generate with quality and few errors, while the cloning is still terrible. Keep in mind that when generating audio, the reference must always be in the same language as the target output. Even so, up to this point, it fails to produce correct pronunciation for unseen speakers.

LONG AUDIO GEN

SHORT AUDIO GEN

I will leave some crucial points for generating quality audio. First, always use samples that are less than 10 seconds long; otherwise, it will add the excess and create a mix with the generated text and the reference text. Second, always ensure that the reference text matches the audio sample precisely, as Whisper sometimes transcribes incorrectly, so be attentive. Third, to ensure the best quality, use NFE Step 64.

Although it looks like "Dilma (ex-brazilian president) speaking" it is way better.

traderpedroso Nov 19, 2024

Olá@traderpedroso, tbm estou trabalhando para a criação de um modelo, porem está muito longe de ser perfeito, estou treinando em um MacBook onde que tem 38c de GPU porem está muito lento, tentei aprimorar o código para aceitar os GPU do Macbook porem não prestou de nada, tem alguma sugestão de como acelerar o treinamento? não sei se pode, mas poderia me compartilhar seu contato para falarmos sobre o assunto?

tem a versão coreml em onnix

krc1983 Nov 19, 2024

Oi, @abhisirka2001 eu estou tendo resultados ruins com o seu modelo, a voz esta ok, mas a pronuncia esta bastante confusa, acredito que seja pela falta do arquivo vocab.txt de portugues, vc pode me dizer onde conseguir? obg!

abhisirka2001 · 2024-11-19T20:39:47Z

abhisirka2001
Nov 19, 2024
Author

yes because i didnt train a separate tokenizer for the brazilian portuguese language and kept the original vocab file of F5TTS because it container majority of the tokens. You can further finetune my model with a new vocab with less data to get good results.

…

On Wed, Nov 20, 2024, 2:05 AM krc1983 ***@***.***> wrote: Oi, @abhisirka2001 <https://github.com/abhisirka2001> eu estou tendo resultados ruins com o seu modelo, a voz esta ok, mas a pronuncia esta bastante confusa, acredito que seja pela falta do arquivo vocab.txt de portugues, vc pode me dizer onde conseguir? obg! — Reply to this email directly, view it on GitHub <#184 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVBPZ3BZ2TWBGJ6X6AQTZOT2BOOIJAVCNFSM6AAAAABQIMKNBSVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMZRGE2DIMQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to fientune for brazilian portuguese language #184

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 23 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Trying to fientune for brazilian portuguese language #184

Replies: 5 comments · 23 replies

abhisirka2001 Oct 30, 2024 Author

abhisirka2001 Nov 19, 2024 Author

abhisirka2001 Nov 19, 2024 Author

Replies: 5 comments 23 replies

abhisirka2001 Oct 30, 2024
Author

abhisirka2001
Nov 19, 2024
Author

abhisirka2001
Nov 19, 2024
Author