Replies: 25 comments 38 replies
-
In the new update, there is an 'experiment' button in the auto train settings. I'm still working on this part so please check setting before start train to make sure all ok for the dataset you use If you encounter memory issues, try using your own settings.
about mult-gpu check al ok about how offer save a checkpoints and last point , note: every checkpoint you need 5G disk space ! About the epoch, the default is now 10 epochs. You may need more or less see what working for you i try make more test and i make some good value soon |
Beta Was this translation helpful? Give feedback.
-
What does the Check Vocab step do practically? I'm sorry, but I'm not able to get it; what happens in this part if I want to fine-tune for french or spanish etc.? |
Beta Was this translation helpful? Give feedback.
-
what you think it's good idea to add info for the system and gpu new tab ? |
Beta Was this translation helpful? Give feedback.
-
Thanks for this great work. I've managed to fine-tine my first model, but I noob question is how do I test the model whether in cli or gui? |
Beta Was this translation helpful? Give feedback.
-
awesome work are you planning to improver and fix errors? |
Beta Was this translation helpful? Give feedback.
-
@lpscr |
Beta Was this translation helpful? Give feedback.
-
you should make it so it informs the user if ffmpeg is not available instead of silently failing on the transcribe, this was giving me problems on a runpod instance. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Can someone explain to me how to practice with languages other than English? |
Beta Was this translation helpful? Give feedback.
-
How many dataset you usually use (how many hours or minutes) to finetune and produce great result?, and I tried it in collab using t4 and needed to lower the batch size per gpu too i guess. Anyway thanks for this easy to use pathway for finetuning F5. |
Beta Was this translation helpful? Give feedback.
-
Hello @lpscr, Thank you for sharing this project! I have been following your instructions:
After placing WAV files in the newly created project folder and checking user, I clicked the Transcribe button and waited for the process to complete. However, the info field displays “You need to load an audio file.” Could you please help me resolve this issue? Thank you! |
Beta Was this translation helpful? Give feedback.
-
usage: finetune_cli.py [-h] [--exp_name {F5TTS_Base,E2TTS_Base}] [--dataset_name DATASET_NAME] Hello @lpscr, It appears that the update for the parameter --file_checkpoint_train has not yet been committed and merged into finetune_cli.py, although it has been merged in finetune_gradio. By the way, is this parameter intended to allow resuming fine-tuning from a previous checkpoint? Could you please provide an example of the exact path to the checkpoint? I assume it would be something like ckpts/project_name/checkpoint.pt, correct? |
Beta Was this translation helpful? Give feedback.
-
To create a public link, set |
Beta Was this translation helpful? Give feedback.
-
@osmania101 Can you check if you have the latest update repo ? . Did you follow the correct installation steps at the beginning to install
|
Beta Was this translation helpful? Give feedback.
-
How can I prepare multi speaker dataset? if i have a multi speaker dataset i should skip speaker id and only keep text and audio pair? |
Beta Was this translation helpful? Give feedback.
-
@HuuHuy227 You don’t need speaker ID, just the audio and text will work for both single and multiple speakers |
Beta Was this translation helpful? Give feedback.
-
I encountered an error while running finetune_gradio.py |
Beta Was this translation helpful? Give feedback.
-
hi @lpscr, I'm currently finetuning F5-TTS with gradio interface you provided for us, it's really amazing. But while I'm trying to finetune it from the previous checkpoint, I'm facing the below issue:
Can you please help me with this. thank you. |
Beta Was this translation helpful? Give feedback.
-
@lpscr thank you for your contributions to this project. Although I have very limited compute resources available (RTX4070 laptop, 8gb vram, 32gb ram) I did some finetuning experiments using your gradio interface (120k steps with ~16h German speaker data ... took 2 days, yet results seem to be worth the effort). Considering aforementioned ressource limitations, I have two questions:
Thanks again for your efforts 3 cheers, MaMe82 |
Beta Was this translation helpful? Give feedback.
-
It's amazing, I have never seen a model so easy to use and delivering such great results. Thank you so much for your contribution to the community. I hope you will continue to update it with even more interesting features. Do you have any recommendations on settings or the amount of data needed when training a new language? In my case, it’s Vietnamese. |
Beta Was this translation helpful? Give feedback.
-
Hi @lpscr , it's a fantastic job! Thanks a lot. |
Beta Was this translation helpful? Give feedback.
-
Hey @lpscr, the tracing works fantastic but I've quite a tracing process. How can I resume it? |
Beta Was this translation helpful? Give feedback.
-
Hi when i tired to fine tune i got the following error
when i tried removing the arg 'e', then the training is stuck for so long with no logs |
Beta Was this translation helpful? Give feedback.
-
I encountered a problem that after successfully traning the model and pressing "test model", the output sound was empty, silent, no sound at all. I tried again many times. . please help me |
Beta Was this translation helpful? Give feedback.
-
Hi @lpscr , I've created a script and successfully built a metadata.csv and a folder with my wav files, I used the 9 hours split so I'm playing actually with just 9 hours of multispeaker Italian language, is it a fair amount of hours for finetuning? basicallly I have I tried 10 epoch but the result isn't good. I don't know EMA here how works, but wouldn't make sense to add an option in "Train Data" page to specify how to generate the sample? This way I can have a better idea of the overall result comparing the samples without using EMA (actually I think it uses EMA by default?) Feel free to give me any tips to improve results and thanks for the UI :) P.S. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone!
I have created a Gradio application for easy fine-tuning and training of models. You can find it here:
https://github.com/lpscr/F5-TTS
EDIT : this merge in main repo
NEW
: with new version all now automatic you can simple easy finetune anylangauge
with just simple clickHere's a new complete video step by step, there is sound!:
Please make sure the video is not muted click the speaker icon in video ! . enjoy ;)
amazing.tutorial.mp4
BTW :
The man voice in the video was created using
f5tts
!you can get from here https://github.com/SWivid/F5-TTS/tree/main/src/f5_tts/infer/examples/basic
note for the new language
Before you start, if you are going to fine-tune a new language, you will need a substantial amount of dataset hours! As I see here, you can fine-tune a single voice with just 10 to 15 hours. but for multiple speakers, you’ll need more—about 50 hours to start. If you want a good model, aim for at least 100 hours; for something perfect, aim for at least 300 hours or more. See what works for you; it might also be possible to achieve good results with fewer hours in your case.
also mentioned success with just 10-15 hours of fine-tuning for one or two voices. follow langauge
Spanish
,Indian (Malayalam) with extend tokens
,Hungarian
note for the English or Chinese
Regarding English or Chinese, if you want to fine-tune a speaker, first check if it's already working because the model is good enough,
and you may not need to fine-tune the speaker. You can test with 2 to 5 hours or more and see what works.
Please share any experiments or results about what works and what doesn’t, so that others can know as well
quick start
first create new project then see what you need
1 . Transcribe Data Option: Skip this if you already have a
metadata.csv
andwavs
folder.You can simply click the audio button to open Explorer and select one or multiple audio files.
If you check the
audio from path
, you need to place all audio files indata/my_speak/dataset
you can click button random sample to see text and audio
2 . Vocab Check Option: Use this only when you want to train a new language.
If you need to extend the vocab, you can easily click "Check Vocab"
to see all missing symbols or write your symbols like
a,b,c,d
etc.If you click "Extend," this creates a new
model_1200000.pt
andvocab.txt
file:3 . Prepare Data Option: Skip this if you already have
raw.arrow
,duration.json
, andvocab.txt
. You can click the random sample button to see token and audio.If you have the files
raw.arrow
,duration.json
, andvocab.txt
, make sure they are in the correct path:in case you skip the Transcribe , place your dataset (
wavs
folder andmetadata.csv
file):Supported audio formats:
"wav", "mp3", "aac", "flac", "m4a", "alac", "ogg", "aiff", "wma", "amr"
.format how look like in
metadata.csv
line 1.
audio1|text1
oraudio1.wav|text1
oryour_path/audio1.wav|text1
line 2.
audio2|text2
oraudio2.mp3|text2
oryour_path/audio2.mp3|text2
...
Click "Prepare" to create
raw.arrow
,duration.json
, andvocab.txt.
4 . Train Data:
auto setting button this give you best results but you need check if all ok
If you encounter memory issues, try using your own settings.
barch size per gpu
to lower numberabout how offer save a checkpoints and last point ,
save per update
sey something working for youlast per step
# use smaller value for to save model_last.pt more offer like this when crash train or stop you can easy continue where you leftnote: every checkpoint you need 5G disk space !
About the
epoch
, the default is now 10 epochs. You may need more or less see what working for youWhen the model trains, you get sample audio every few steps to see how well the model is doing. Click the refresh button or check in the path
ckpts/my_speak/sample
folder.5 . Test Model: Testing your model is simple and easy. Check use_ema to be True or False to see what works best for you.
when you run the train the test model working in
cpu
mode ! you need stop the train to run ingpu
Click the 'Random Sample' button to view get a text and audio. for dataset
You can compare reference (ref) and generated (gen) audio, enter text in 'gen_text,' or load a new reference in 'ref_text.'
To load your audio reference, click the 'X' button. If the ref text is empty, it will automatically transcribe.
6 . Reduce Model Size: You can reduce the model size from 5GB to 1GB.
you find check point to
you see all automatic now ;)
Beta Was this translation helpful? Give feedback.
All reactions