add api for easy use #186

lpscr · 2024-10-20T12:59:31Z

hi @SWivid i create api to easy use . and now you can use anywhere you like and simple call

only i notice something the cpu it's not working with last update fp16 can you check it
i know the cpu for interface slow but from me it's immortal because when i do train and i want to check checkpoit i call infer with cpu to don't crash memory

from api import F5TTS

# Usage simple

f5tts=F5TTS()

wav,sr,spect=f5tts.infer(
    ref_file="tests/ref_audio/test_en_1_ref_short.wav",
    ref_text="Some call me nature, others call me mother nature.",
    gen_text="""I don't really care what you call me.""",
    file_wave="/home/F5-TTS/test.wav"
)

# Usage advance:

f5tts=F5TTS(
    model_type="F5-TTS",
    ckpt_file=r"/home/F5-TTS/ckpts/my_speak/model_1200000.pt", 
    vocab_file=r"/home/F5-TTS/data/my_speak/vocab.txt",
    ode_method="euler",
    use_ema=True,
    local_path="/vecoder",
    device="cuda")

wav,sr,spect=f5tts.infer(
    ref_file="tests/ref_audio/test_en_1_ref_short.wav",
    ref_text="Some call me nature, others call me mother nature.",
    gen_text="""I don't really care what you call me.""",
    sway_sampling_coef=-1,
    cfg_strength=2,
    nfe_step=16,
    speed=1.0,
    fix_duration=None,
    remove_silence=False,
    file_wave="/home/F5-TTS/test.wav",
    file_spect="/home/F5-TTS/test.png",
    cross_fade_duration=0.15
)

in jupyter

#display audio
from IPython.display import Audio, display
display(Audio(data=wav,rate=sr,normalize=True))

#display Spectrogram
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(10,2))
plt.imshow(spect, origin='lower', aspect='auto', cmap='viridis')  # Added colormap for better visualization
plt.colorbar(label='Intensity')  # Label the colorbar for clarity
plt.title('Spectrogram')  # Add a title
plt.xlabel('Time (s)')  # Label the x-axis
plt.ylabel('Frequency (Hz)')  # Label the y-axis
plt.axis('on')  # Optional: Turn on the axes if needed
plt.show()

review

SWivid · 2024-10-20T14:48:06Z

Hi @lpscr
b4f8142 fix for cpu inference, use fp32 for cpu is fine.

will reorganize inference scripts

lpscr · 2024-10-20T15:15:58Z

sorry by mistake i close , i try to see little later and i tell you

lpscr · 2024-10-20T17:13:01Z

Hi @lpscr b4f8142 fix for cpu inference, use fp32 for cpu is fine.

will reorganize inference scripts

working great thank you for the update

SWivid · 2024-10-20T18:57:30Z

Hi @lpscr
Inference scripts reorganized, see if app.py could borrow some shared funcs in model/utils_infer.py
and also some descriptions in README.md to guide usage are appreciated.
Thanks~

lpscr · 2024-10-20T19:59:48Z

@SWivid great thank you only one think it;s not useful to change this values

    ode_method = "euler" # you can use also midpoint or rk4
    sway_sampling_coef=-1,
    cfg_strength=2,
    nfe_step=16,
    fix_duration=None,

like this you can't change this values it's static

https://github.com/SWivid/F5-TTS/blob/5600d9079a2813e9b0dc1ef9a52193604eed4828/model/utils_infer.py#L39C1-L52C44

its not useful change this values ? i mean i see
ode_method = "euler" also midpoint give nice results and rk4
nfe_step speed up , by x2 and x3 about the steps you use
cfg_strength give interesting results when you change make more emotion for 2 in 4
fix_duration fix crazy stuff repeat in my case help me to put manual sometime this help

also when you have free time check the api

here how i use it i think good to use all you make limits like this and you have make amazing app
https://github.com/lpscr/F5-TTS/blob/6727245752782b372ee48c984e06c604b10eea97/api.py#L161

thank you very much for the great work ! i love this repo

SWivid · 2024-10-20T20:28:12Z

@lpscr yes, we may just separate these parameter setting for gradio and cli
i was just thinking of changing values in model/utils_infer.py to control at once both gradio and cli.
there would definitely be a better way ~

Just change it the way that feels good to you 👌.

SWivid · 2024-10-20T20:33:18Z

here how i use it i think good to use all you make limits like this and you have make amazing app
https://github.com/lpscr/F5-TTS/blob/6727245752782b372ee48c984e06c604b10eea97/api.py#L161

i'm about to rest as it's quite approching dawn here lol.
thought api.py is great, will see and follow updates making ~

lpscr · 2024-10-20T20:35:27Z

here how i use it i think good to use all you make limits like this and you have make amazing app
https://github.com/lpscr/F5-TTS/blob/6727245752782b372ee48c984e06c604b10eea97/api.py#L161

i'm about to rest as it's quite approching dawn here lol. thought api.py is great, will see and follow updates making ~

yes understand me too i am all day in this repo :) happy to help
also i found way i think to finetune any language with out lost other english and chinise , but still experiment

sachin-seisei · 2024-10-21T07:48:12Z

@lpscr hey man thanks for the api script, thats fantastic!! so while i tried to run this script its notthrowing error but the audio is not being generated the spectrogram is blank!! can you tell me what could be the issue??
terminal logs:
E:\ML\envs\coqui-tts\Lib\site-packages\torch\functional.py:666: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\EmptyTensor.cpp:45.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
E:\ML\ml_projects\project_folder\F5-TTS\model\modules.py:359: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.10s/it]

(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>
(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>python mod_api.py
Downloading vocoder from huggingface charactr/vocos-mel-24khz
Downloading vocoder from huggingface charactr/vocos-mel-24khz
gen_text 0 I don't really care what you call me.
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache
Loading model cost 0.893 seconds.
Prefix dict has been built successfully.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.62s/it]

lpscr · 2024-10-21T08:04:18Z

hi @SWivid i just update api and utils_infer to pass all value
please check and let me know now all ok i think now very good
you can use all

can tell me why get update infer limits X ? i am new in github ...
let me know to fix it

when i make new git clone for test update working fine what wrong ?

lpscr · 2024-10-21T08:32:20Z

@lpscr hey man thanks for the api script, thats fantastic!! so while i tried to run this script its notthrowing error but the audio is not being generated the spectrogram is blank!! can you tell me what could be the issue?? terminal logs: E:\ML\envs\coqui-tts\Lib\site-packages\torch\functional.py:666: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\EmptyTensor.cpp:45.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] E:\ML\ml_projects\project_folder\F5-TTS\model\modules.py:359: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.10s/it]

(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS> (coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>python mod_api.py Downloading vocoder from huggingface charactr/vocos-mel-24khz Downloading vocoder from huggingface charactr/vocos-mel-24khz gen_text 0 I don't really care what you call me. 0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ... Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache Loading model cost 0.893 seconds. Prefix dict has been built successfully. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.62s/it]

try
make new folder run

git clone https://github.com/lpscr/F5-TTS.git
cd F5-TTS
python api.py

SWivid · 2024-10-21T08:32:25Z

why get update infer limits X

you mean the checks ? a pre-commit workflow is added, which is basically a format checker

could review from Details at right side, i'll also help check your updates.

a4ca14b , see here

lpscr · 2024-10-21T08:42:09Z

sorry @SWivid

pip install pre-commit
pre-commit install
pre-commit run --all-files

i don't understand i do this get all pass

what wrong

SWivid · 2024-10-21T08:44:57Z

what wrong

Hi @lpscr , thought you just got it right.
just let it format also for utils_infer.py modifications

lpscr · 2024-10-21T08:51:09Z

what wrong

Hi @lpscr , thought you just got it right. just let it format also for utils_infer.py modifications

i think i fix can you please check it

sos:
every time need run pre-commit run --all-files ? before i make git add , git commen , git push ?

SWivid · 2024-10-21T08:58:36Z

@lpscr there's some pre-commit-bots to do it automatically I think, but i'm also new to this, lol

sachin-seisei · 2024-10-21T09:06:02Z

@lpscr hey man thanks for the api script, thats fantastic!! so while i tried to run this script its notthrowing error but the audio is not being generated the spectrogram is blank!! can you tell me what could be the issue?? terminal logs: E:\ML\envs\coqui-tts\Lib\site-packages\torch\functional.py:666: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\EmptyTensor.cpp:45.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] E:\ML\ml_projects\project_folder\F5-TTS\model\modules.py:359: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.10s/it]
(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS> (coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>python mod_api.py Downloading vocoder from huggingface charactr/vocos-mel-24khz Downloading vocoder from huggingface charactr/vocos-mel-24khz gen_text 0 I don't really care what you call me. 0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ... Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache Loading model cost 0.893 seconds. Prefix dict has been built successfully. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.62s/it]

try make new folder run
git clone https://github.com/lpscr/F5-TTS.git
cd F5-TTS
python api.py

@lpscr as you said i created a new folder and cloned the repo and ran api.py script still its generating empty audio!! any possible wat to debug and know what is the actual cause of it??
logs:
Using cuda device
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Download Vocos from huggingface charactr/vocos-mel-24khz

vocab : Emilia_ZH_EN
vocab : Emilia_ZH_EN
tokenizer : pinyin
model : C:\Users\ASUS.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\d0ac03c2b5ded76f302ada887ae0da5675e88a5d\F5TTS_Base\model_1200000.safetensors

gen_text 0 I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences.
Generating audio in 1 batches...
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
Prefix dict has been built successfully.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [02:54<00:00, 174.20s/it]

juangea · 2024-11-03T19:52:46Z

Hello, I'm testing the API, and I'm loving it, specially because I can disable ema with the pt model, not sure how to get a safetensors without ema though.

But when I try "fix_duration" I always get a 0 seconds audio, any ideas?

Thanks!

lpscr · 2024-11-03T21:46:32Z

@juangea when you reduse the model take only ema dict ema_model_state_dict , maybe i put option on this to use_ema
so get ema_model_state_dict or model_state_dict

you can see here part in code
replace the ema_model_state_dict to model_state_dict
like this take use_ema = False like you use in api or test model
then export model again

F5-TTS/src/f5_tts/train/finetune_gradio.py

Lines 999 to 1016 in ac77a76

    
           def extract_and_save_ema_model(checkpoint_path: str, new_checkpoint_path: str, safetensors: bool) -> str: 
        
               try: 
        
                   checkpoint = torch.load(checkpoint_path) 
        
                   print("Original Checkpoint Keys:", checkpoint.keys()) 
        
                   ema_model_state_dict = checkpoint.get("ema_model_state_dict", None) 
        
                   if ema_model_state_dict is None: 
        
                       return "No 'ema_model_state_dict' found in the checkpoint." 
        
                   if safetensors: 
        
                       new_checkpoint_path = new_checkpoint_path.replace(".pt", ".safetensors") 
        
                       save_file(ema_model_state_dict, new_checkpoint_path) 
        
                   else: 
        
                       new_checkpoint_path = new_checkpoint_path.replace(".safetensors", ".pt") 
        
                       new_checkpoint = {"ema_model_state_dict": ema_model_state_dict} 
        
                       torch.save(new_checkpoint, new_checkpoint_path) 
        
                   return f"New checkpoint saved at: {new_checkpoint_path}"

juangea · 2024-11-04T12:51:07Z

AWesome, thanks @lpscr .

About the "fix_duration" does someone knows something?

Each time i try to use it from python/api it delivers 0 seconds audio, no matter the other parameters, I change it from "None" to "3.5" for example, and I get no audio.

Thanks!

lpscr · 2024-11-04T13:46:05Z

@juangea This happened because the reference audio you used has a shorter duration , it needs to be the same length or longer.

juangea · 2024-11-04T13:48:20Z

Ok! I’ll test it, great to know :)

thanks!

juangea · 2024-11-04T14:32:45Z

it's not working for me, I gave a 9 seconds audio file, and I tried with the exact same duration, using soundfile to measure it, and it delivered 0 seconds audio, also I tried to manually set 8 seconds and the same, 0 seconds audio.

It's like with the API I cannot get any result if I enable the "fix_duration" parameter, is there any other thing I should check? I see no error in the terminal at all, just the resulting wav audio has 0 seconds duration

SWivid · 2024-11-04T14:34:29Z

if provide a 9 second reference audio file, and want to generate an audio of 6 seconds.
need fix_duration=15

juangea · 2024-11-04T14:36:24Z

thanks! gonna try right now.

juangea · 2024-11-04T14:46:02Z

It's working, however if the phrase is too long, with the API I get an strange error:

Traceback (most recent call last):
File "d:\AI\F5_TTS\F5-TTS\f5_inference_test.py", line 44, in
wav,sr,spect=f5tts.infer(
File "D:\AI\F5_TTS\F5-TTS\src\f5_tts\api.py", line 109, in infer
ref_file, ref_text = preprocess_ref_audio_text(ref_file, ref_text, device=self.device)
File "D:\AI\F5_TTS\F5-TTS\src\f5_tts\infer\utils_infer.py", line 274, in preprocess_ref_audio_text
if not ref_text.strip():
AttributeError: 'tuple' object has no attribute 'strip'

I think it has no relation to the fix_duration, because with a shorter phrase it wored, the weird thing is that with a try with that long phrase it worked fine, so I'm not sure why am I getting this error.

SWivid · 2024-11-04T14:52:46Z

AttributeError: 'tuple' object has no attribute 'strip'

it's literally passing a invalid ref_text format, check if the ref_text is correctly passed in with str format

juangea · 2024-11-04T14:57:23Z

I'm sorry!!!

You are right, in the variable at the end I had a semi colon, but python was not giving an error at definition time, so it was passing it as a tuple even when it had only one value.

thanks and sorry for that, delete this messages if you want, because it's not related to the API

SWivid · 2024-11-04T15:03:00Z

lol no worry, if working just fine

add api

33679b9

lpscr closed this Oct 20, 2024

lpscr reopened this Oct 20, 2024

Merge branch 'SWivid:main' into main

6727245

Merge branch 'SWivid:main' into main

c291527

lpscr added 2 commits October 21, 2024 11:01

update api

d4cb542

update infer limits

5d02a54

update api

99eb50d

update

6a333fa

lpscr added 2 commits October 21, 2024 11:49

update

e3ece35

update

e6d9029

SWivid merged commit 25cdc51 into SWivid:main Oct 21, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add api for easy use #186

add api for easy use #186

lpscr commented Oct 20, 2024 •

edited

Loading

SWivid commented Oct 20, 2024 •

edited

Loading

lpscr commented Oct 20, 2024

lpscr commented Oct 20, 2024

SWivid commented Oct 20, 2024 •

edited

Loading

lpscr commented Oct 20, 2024 •

edited

Loading

SWivid commented Oct 20, 2024

SWivid commented Oct 20, 2024

lpscr commented Oct 20, 2024

sachin-seisei commented Oct 21, 2024

lpscr commented Oct 21, 2024 •

edited

Loading

lpscr commented Oct 21, 2024

SWivid commented Oct 21, 2024

lpscr commented Oct 21, 2024

SWivid commented Oct 21, 2024 •

edited

Loading

lpscr commented Oct 21, 2024 •

edited

Loading

SWivid commented Oct 21, 2024

sachin-seisei commented Oct 21, 2024

juangea commented Nov 3, 2024

lpscr commented Nov 3, 2024 •

edited

Loading

juangea commented Nov 4, 2024

lpscr commented Nov 4, 2024 •

edited

Loading

juangea commented Nov 4, 2024

juangea commented Nov 4, 2024

SWivid commented Nov 4, 2024

juangea commented Nov 4, 2024

juangea commented Nov 4, 2024

SWivid commented Nov 4, 2024

juangea commented Nov 4, 2024

SWivid commented Nov 4, 2024

add api for easy use #186

add api for easy use #186

Conversation

lpscr commented Oct 20, 2024 • edited Loading

SWivid commented Oct 20, 2024 • edited Loading

lpscr commented Oct 20, 2024

lpscr commented Oct 20, 2024

SWivid commented Oct 20, 2024 • edited Loading

lpscr commented Oct 20, 2024 • edited Loading

SWivid commented Oct 20, 2024

SWivid commented Oct 20, 2024

lpscr commented Oct 20, 2024

sachin-seisei commented Oct 21, 2024

lpscr commented Oct 21, 2024 • edited Loading

lpscr commented Oct 21, 2024

SWivid commented Oct 21, 2024

lpscr commented Oct 21, 2024

SWivid commented Oct 21, 2024 • edited Loading

lpscr commented Oct 21, 2024 • edited Loading

SWivid commented Oct 21, 2024

sachin-seisei commented Oct 21, 2024

juangea commented Nov 3, 2024

lpscr commented Nov 3, 2024 • edited Loading

juangea commented Nov 4, 2024

lpscr commented Nov 4, 2024 • edited Loading

juangea commented Nov 4, 2024

juangea commented Nov 4, 2024

SWivid commented Nov 4, 2024

juangea commented Nov 4, 2024

juangea commented Nov 4, 2024

SWivid commented Nov 4, 2024

juangea commented Nov 4, 2024

SWivid commented Nov 4, 2024

lpscr commented Oct 20, 2024 •

edited

Loading

SWivid commented Oct 20, 2024 •

edited

Loading

SWivid commented Oct 20, 2024 •

edited

Loading

lpscr commented Oct 20, 2024 •

edited

Loading

lpscr commented Oct 21, 2024 •

edited

Loading

SWivid commented Oct 21, 2024 •

edited

Loading

lpscr commented Oct 21, 2024 •

edited

Loading

lpscr commented Nov 3, 2024 •

edited

Loading

lpscr commented Nov 4, 2024 •

edited

Loading