Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add api for easy use #186

Merged
merged 9 commits into from
Oct 21, 2024
Merged

add api for easy use #186

merged 9 commits into from
Oct 21, 2024

Conversation

lpscr
Copy link
Contributor

@lpscr lpscr commented Oct 20, 2024

hi @SWivid i create api to easy use . and now you can use anywhere you like and simple call

only i notice something the cpu it's not working with last update fp16 can you check it
i know the cpu for interface slow but from me it's immortal because when i do train and i want to check checkpoit i call infer with cpu to don't crash memory

from api import F5TTS

# Usage simple

f5tts=F5TTS()

wav,sr,spect=f5tts.infer(
    ref_file="tests/ref_audio/test_en_1_ref_short.wav",
    ref_text="Some call me nature, others call me mother nature.",
    gen_text="""I don't really care what you call me.""",
    file_wave="/home/F5-TTS/test.wav"
)
# Usage advance:

f5tts=F5TTS(
    model_type="F5-TTS",
    ckpt_file=r"/home/F5-TTS/ckpts/my_speak/model_1200000.pt", 
    vocab_file=r"/home/F5-TTS/data/my_speak/vocab.txt",
    ode_method="euler",
    use_ema=True,
    local_path="/vecoder",
    device="cuda")

wav,sr,spect=f5tts.infer(
    ref_file="tests/ref_audio/test_en_1_ref_short.wav",
    ref_text="Some call me nature, others call me mother nature.",
    gen_text="""I don't really care what you call me.""",
    sway_sampling_coef=-1,
    cfg_strength=2,
    nfe_step=16,
    speed=1.0,
    fix_duration=None,
    remove_silence=False,
    file_wave="/home/F5-TTS/test.wav",
    file_spect="/home/F5-TTS/test.png",
    cross_fade_duration=0.15
)

in jupyter

#display audio
from IPython.display import Audio, display
display(Audio(data=wav,rate=sr,normalize=True))
#display Spectrogram
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(10,2))
plt.imshow(spect, origin='lower', aspect='auto', cmap='viridis')  # Added colormap for better visualization
plt.colorbar(label='Intensity')  # Label the colorbar for clarity
plt.title('Spectrogram')  # Add a title
plt.xlabel('Time (s)')  # Label the x-axis
plt.ylabel('Frequency (Hz)')  # Label the y-axis
plt.axis('on')  # Optional: Turn on the axes if needed
plt.show()

review
image

@SWivid
Copy link
Owner

SWivid commented Oct 20, 2024

Hi @lpscr
b4f8142 fix for cpu inference, use fp32 for cpu is fine.

will reorganize inference scripts

@lpscr lpscr closed this Oct 20, 2024
@lpscr lpscr reopened this Oct 20, 2024
@lpscr
Copy link
Contributor Author

lpscr commented Oct 20, 2024

sorry by mistake i close , i try to see little later and i tell you

@lpscr
Copy link
Contributor Author

lpscr commented Oct 20, 2024

Hi @lpscr b4f8142 fix for cpu inference, use fp32 for cpu is fine.

will reorganize inference scripts

working great thank you for the update

@SWivid
Copy link
Owner

SWivid commented Oct 20, 2024

Hi @lpscr
Inference scripts reorganized, see if app.py could borrow some shared funcs in model/utils_infer.py
and also some descriptions in README.md to guide usage are appreciated.
Thanks~

@lpscr
Copy link
Contributor Author

lpscr commented Oct 20, 2024

@SWivid great thank you only one think it;s not useful to change this values

    ode_method = "euler" # you can use also midpoint or rk4
    sway_sampling_coef=-1,
    cfg_strength=2,
    nfe_step=16,
    fix_duration=None,

like this you can't change this values it's static

https://github.com/SWivid/F5-TTS/blob/5600d9079a2813e9b0dc1ef9a52193604eed4828/model/utils_infer.py#L39C1-L52C44

its not useful change this values ? i mean i see
ode_method = "euler" also midpoint give nice results and rk4
nfe_step speed up , by x2 and x3 about the steps you use
cfg_strength give interesting results when you change make more emotion for 2 in 4
fix_duration fix crazy stuff repeat in my case help me to put manual sometime this help

also when you have free time check the api

here how i use it i think good to use all you make limits like this and you have make amazing app
https://github.com/lpscr/F5-TTS/blob/6727245752782b372ee48c984e06c604b10eea97/api.py#L161

thank you very much for the great work ! i love this repo

@SWivid
Copy link
Owner

SWivid commented Oct 20, 2024

@lpscr yes, we may just separate these parameter setting for gradio and cli
i was just thinking of changing values in model/utils_infer.py to control at once both gradio and cli.
there would definitely be a better way ~

Just change it the way that feels good to you 👌.

@SWivid
Copy link
Owner

SWivid commented Oct 20, 2024

here how i use it i think good to use all you make limits like this and you have make amazing app
https://github.com/lpscr/F5-TTS/blob/6727245752782b372ee48c984e06c604b10eea97/api.py#L161

i'm about to rest as it's quite approching dawn here lol.
thought api.py is great, will see and follow updates making ~

@lpscr
Copy link
Contributor Author

lpscr commented Oct 20, 2024

here how i use it i think good to use all you make limits like this and you have make amazing app
https://github.com/lpscr/F5-TTS/blob/6727245752782b372ee48c984e06c604b10eea97/api.py#L161

i'm about to rest as it's quite approching dawn here lol. thought api.py is great, will see and follow updates making ~

yes understand me too i am all day in this repo :) happy to help
also i found way i think to finetune any language with out lost other english and chinise , but still experiment

@sachin-seisei
Copy link

@lpscr hey man thanks for the api script, thats fantastic!! so while i tried to run this script its notthrowing error but the audio is not being generated the spectrogram is blank!! can you tell me what could be the issue??
terminal logs:
E:\ML\envs\coqui-tts\Lib\site-packages\torch\functional.py:666: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\EmptyTensor.cpp:45.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
E:\ML\ml_projects\project_folder\F5-TTS\model\modules.py:359: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.10s/it]

(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>
(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>python mod_api.py
Downloading vocoder from huggingface charactr/vocos-mel-24khz
Downloading vocoder from huggingface charactr/vocos-mel-24khz
gen_text 0 I don't really care what you call me.
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache
Loading model cost 0.893 seconds.
Prefix dict has been built successfully.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.62s/it]

@lpscr
Copy link
Contributor Author

lpscr commented Oct 21, 2024

hi @SWivid i just update api and utils_infer to pass all value
please check and let me know now all ok i think now very good
you can use all

can tell me why get update infer limits X ? i am new in github ...
let me know to fix it

when i make new git clone for test update working fine what wrong ?

@lpscr
Copy link
Contributor Author

lpscr commented Oct 21, 2024

@lpscr hey man thanks for the api script, thats fantastic!! so while i tried to run this script its notthrowing error but the audio is not being generated the spectrogram is blank!! can you tell me what could be the issue?? terminal logs: E:\ML\envs\coqui-tts\Lib\site-packages\torch\functional.py:666: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\EmptyTensor.cpp:45.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] E:\ML\ml_projects\project_folder\F5-TTS\model\modules.py:359: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.10s/it]

(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS> (coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>python mod_api.py Downloading vocoder from huggingface charactr/vocos-mel-24khz Downloading vocoder from huggingface charactr/vocos-mel-24khz gen_text 0 I don't really care what you call me. 0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ... Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache Loading model cost 0.893 seconds. Prefix dict has been built successfully. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.62s/it]

try
make new folder run

git clone https://github.com/lpscr/F5-TTS.git
cd F5-TTS
python api.py

@SWivid
Copy link
Owner

SWivid commented Oct 21, 2024

why get update infer limits X

you mean the checks ? a pre-commit workflow is added, which is basically a format checker
image
could review from Details at right side, i'll also help check your updates.

a4ca14b , see here

@lpscr
Copy link
Contributor Author

lpscr commented Oct 21, 2024

sorry @SWivid

pip install pre-commit
pre-commit install
pre-commit run --all-files

i don't understand i do this get all pass

image

what wrong

@SWivid
Copy link
Owner

SWivid commented Oct 21, 2024

what wrong

Hi @lpscr , thought you just got it right.
just let it format also for utils_infer.py modifications

@lpscr
Copy link
Contributor Author

lpscr commented Oct 21, 2024

what wrong

Hi @lpscr , thought you just got it right. just let it format also for utils_infer.py modifications

i think i fix can you please check it

sos:
every time need run pre-commit run --all-files ? before i make git add , git commen , git push ?

@SWivid SWivid merged commit 25cdc51 into SWivid:main Oct 21, 2024
1 check passed
@SWivid
Copy link
Owner

SWivid commented Oct 21, 2024

@lpscr there's some pre-commit-bots to do it automatically I think, but i'm also new to this, lol

@sachin-seisei
Copy link

@lpscr hey man thanks for the api script, thats fantastic!! so while i tried to run this script its notthrowing error but the audio is not being generated the spectrogram is blank!! can you tell me what could be the issue?? terminal logs: E:\ML\envs\coqui-tts\Lib\site-packages\torch\functional.py:666: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\EmptyTensor.cpp:45.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] E:\ML\ml_projects\project_folder\F5-TTS\model\modules.py:359: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=0.0, is_causal=False) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:40<00:00, 40.10s/it]
(coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS> (coqui-tts) E:\ML\ml_projects\project_folder\F5-TTS>python mod_api.py Downloading vocoder from huggingface charactr/vocos-mel-24khz Downloading vocoder from huggingface charactr/vocos-mel-24khz gen_text 0 I don't really care what you call me. 0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ... Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache Loading model cost 0.893 seconds. Prefix dict has been built successfully. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:47<00:00, 47.62s/it]

try make new folder run

git clone https://github.com/lpscr/F5-TTS.git
cd F5-TTS
python api.py

@lpscr as you said i created a new folder and cloned the repo and ran api.py script still its generating empty audio!! any possible wat to debug and know what is the actual cause of it??
logs:
Using cuda device
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Download Vocos from huggingface charactr/vocos-mel-24khz

vocab : Emilia_ZH_EN
vocab : Emilia_ZH_EN
tokenizer : pinyin
model : C:\Users\ASUS.cache\huggingface\hub\models--SWivid--F5-TTS\snapshots\d0ac03c2b5ded76f302ada887ae0da5675e88a5d\F5TTS_Base\model_1200000.safetensors

gen_text 0 I don't really care what you call me. I've been a silent spectator, watching species evolve, empires rise and fall. But always remember, I am mighty and enduring. Respect me and I'll nurture you; ignore me and you shall face the consequences.
Generating audio in 1 batches...
0%| | 0/1 [00:00<?, ?it/s]Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\ASUS\AppData\Local\Temp\jieba.cache
Loading model cost 0.675 seconds.
Prefix dict has been built successfully.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [02:54<00:00, 174.20s/it]

@juangea
Copy link

juangea commented Nov 3, 2024

Hello, I'm testing the API, and I'm loving it, specially because I can disable ema with the pt model, not sure how to get a safetensors without ema though.

But when I try "fix_duration" I always get a 0 seconds audio, any ideas?

Thanks!

@lpscr
Copy link
Contributor Author

lpscr commented Nov 3, 2024

@juangea when you reduse the model take only ema dict ema_model_state_dict , maybe i put option on this to use_ema
so get ema_model_state_dict or model_state_dict

you can see here part in code
replace the ema_model_state_dict to model_state_dict
like this take use_ema = False like you use in api or test model
then export model again

def extract_and_save_ema_model(checkpoint_path: str, new_checkpoint_path: str, safetensors: bool) -> str:
try:
checkpoint = torch.load(checkpoint_path)
print("Original Checkpoint Keys:", checkpoint.keys())
ema_model_state_dict = checkpoint.get("ema_model_state_dict", None)
if ema_model_state_dict is None:
return "No 'ema_model_state_dict' found in the checkpoint."
if safetensors:
new_checkpoint_path = new_checkpoint_path.replace(".pt", ".safetensors")
save_file(ema_model_state_dict, new_checkpoint_path)
else:
new_checkpoint_path = new_checkpoint_path.replace(".safetensors", ".pt")
new_checkpoint = {"ema_model_state_dict": ema_model_state_dict}
torch.save(new_checkpoint, new_checkpoint_path)
return f"New checkpoint saved at: {new_checkpoint_path}"

@juangea
Copy link

juangea commented Nov 4, 2024

AWesome, thanks @lpscr .

About the "fix_duration" does someone knows something?

Each time i try to use it from python/api it delivers 0 seconds audio, no matter the other parameters, I change it from "None" to "3.5" for example, and I get no audio.

Thanks!

@lpscr
Copy link
Contributor Author

lpscr commented Nov 4, 2024

@juangea This happened because the reference audio you used has a shorter duration , it needs to be the same length or longer.

@juangea
Copy link

juangea commented Nov 4, 2024

Ok! I’ll test it, great to know :)

thanks!

@juangea
Copy link

juangea commented Nov 4, 2024

it's not working for me, I gave a 9 seconds audio file, and I tried with the exact same duration, using soundfile to measure it, and it delivered 0 seconds audio, also I tried to manually set 8 seconds and the same, 0 seconds audio.

It's like with the API I cannot get any result if I enable the "fix_duration" parameter, is there any other thing I should check? I see no error in the terminal at all, just the resulting wav audio has 0 seconds duration

@SWivid
Copy link
Owner

SWivid commented Nov 4, 2024

if provide a 9 second reference audio file, and want to generate an audio of 6 seconds.
need fix_duration=15

@juangea
Copy link

juangea commented Nov 4, 2024

thanks! gonna try right now.

@juangea
Copy link

juangea commented Nov 4, 2024

It's working, however if the phrase is too long, with the API I get an strange error:

Traceback (most recent call last):
File "d:\AI\F5_TTS\F5-TTS\f5_inference_test.py", line 44, in
wav,sr,spect=f5tts.infer(
File "D:\AI\F5_TTS\F5-TTS\src\f5_tts\api.py", line 109, in infer
ref_file, ref_text = preprocess_ref_audio_text(ref_file, ref_text, device=self.device)
File "D:\AI\F5_TTS\F5-TTS\src\f5_tts\infer\utils_infer.py", line 274, in preprocess_ref_audio_text
if not ref_text.strip():
AttributeError: 'tuple' object has no attribute 'strip'

I think it has no relation to the fix_duration, because with a shorter phrase it wored, the weird thing is that with a try with that long phrase it worked fine, so I'm not sure why am I getting this error.

@SWivid
Copy link
Owner

SWivid commented Nov 4, 2024

AttributeError: 'tuple' object has no attribute 'strip'

it's literally passing a invalid ref_text format, check if the ref_text is correctly passed in with str format

@juangea
Copy link

juangea commented Nov 4, 2024

I'm sorry!!!

You are right, in the variable at the end I had a semi colon, but python was not giving an error at definition time, so it was passing it as a tuple even when it had only one value.

thanks and sorry for that, delete this messages if you want, because it's not related to the API

@SWivid
Copy link
Owner

SWivid commented Nov 4, 2024

lol no worry, if working just fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants