Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a TTS engine #3

Open
yacineMTB opened this issue Jun 12, 2023 · 10 comments
Open

Add a TTS engine #3

yacineMTB opened this issue Jun 12, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@yacineMTB
Copy link
Owner

This thing needs to respond back to us on some event.
Right now, the strategy to reduce latency is to generate precanned responses constantly. Maybe we can also follow the same strategy with some TTS system?

Ideally this would

  • be abstracted in some function, takes some text and responds with audio

For now we can just save it as a wav file. The scope of this task is figuring out what reasonable candidates we have for TTS, with one of the goals being low latency.

@yacineMTB yacineMTB added the enhancement New feature or request label Jun 12, 2023
@yacineMTB yacineMTB pinned this issue Jun 12, 2023
@abacaj
Copy link

abacaj commented Jun 12, 2023

What do you think of https://github.com/rhasspy/piper - was pretty straightforward to set up. I haven't been able to train it but the voices were ok

@yacineMTB
Copy link
Owner Author

Thank you for sharing!! I think that this is a facade that uses mimic3 under the hood. It's cpp so I should be able to churn out a binding pretty quickly for this

@yacineMTB
Copy link
Owner Author

Yeah i looked at the code & some samples, and asked a friend of mine

it is perfect for this

thanks @synesthesiam!!

@iacore
Copy link

iacore commented Jun 12, 2023

I've heard that the Mycroft model is not really good. Maybe it's better to use Microsoft's TTS.

@ariym
Copy link

ariym commented Jun 12, 2023

@yacineMTB
Copy link
Owner Author

yacineMTB commented Jun 12, 2023

coqui-ai

From a quick glance it seems too bloated

Microsoft's TTS

Is it locally runnable?

coqui-ai

From a quick glance this seems too bloated

image

I think this is why mimicv3 wins. This is actually ridiculous. Plus i think the project is kinda based
Also, the models are highly variable based on data quality. Picking mimic, but I'll abstract the TTS portion so it's swappable.

@iacore
Copy link

iacore commented Jun 12, 2023

https://github.com/iacore/nix-tts is pretty good

Microsoft TTS is usable on Windows machines. On Linux there is espeak, although the quality is not good.

@yacineMTB yacineMTB self-assigned this Jun 12, 2023
@synesthesiam
Copy link

@yacineMTB you're welcome! I wrote both Piper and Mimic 3: Piper is the better choice as it's newer and faster 👍

@jmanhype

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants