Tried it and WAS NOT IMPRESSED, am I doing it wrong?? #226
Unanswered
GPU-server
asked this question in
Q&A
Replies: 2 comments 1 reply
-
#225 https://voca.ro/13QHMeIsGs0e |
Beta Was this translation helpful? Give feedback.
1 reply
-
@GPU-server in my experience reference voice sample + actual text to generate voice should be less than 30 seconds for it to work properly. So in my case 4/5 seconds reference voice sample + actual text to generate voice (20 seconds) worked fine. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I downloaded the sstuff and installed libraries.
run gradio. opened the web server.
I went to inference and inserted a video game character audio (without music, about 20 seconds) and then written a text for it to generate it.
First seconds generated had a word that was invented, then the rest (the actual text I gave it) was not that good quality (noise, or not calibratied, not good quality)
Why is everyone talking about F5 TTS, I surely configured it badly or I am missing something? Please someone one tell me. I wanna be impressed.
Beta Was this translation helpful? Give feedback.
All reactions