Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support espeak language (dictionary) and voice (phoneme) data #14

Closed
rhdunn opened this issue Nov 23, 2011 · 2 comments
Closed

support espeak language (dictionary) and voice (phoneme) data #14

rhdunn opened this issue Nov 23, 2011 · 2 comments

Comments

@rhdunn
Copy link
Owner

rhdunn commented Nov 23, 2011

At the moment, the cainteoir engine links to the external espeak library. This has various problems:
1/ there is code in espeak for handling ssml and html tags, but cainteoir processes these at a different level before passing the text to espeak;
2/ control over where sentence breaks (pauses) and word breaks are is poor -- espeak gets passed text blocks as found, so "1st" gets spoken incorrectly as espeak does not recognise this as "first";
3/ lack of control over the dictionary makes it difficult to support adding words or reloading a dictionary while the application is running -- this also makes dictionary verifiers hard to implement;
4/ lack of fine grained control over prosody vs pronunciation and being able to separate different phases of espeak makes it difficult to control at this level;
5/ lack of buffer support for returning translation data makes it difficult to embed this functionality in applications (the API uses FILE* redirect).

@rhdunn
Copy link
Owner Author

rhdunn commented Sep 10, 2012

The espeak implementation has a different architecture to the one planned for the Cainteoir engine. It combines several phases (syllable analysis, phoneme morphology, etc.), making it hard to isolate these and test them. The implementation is in C with a poorly maintained codebase (unused variables, variables all declared at the top, int/1/0 instead of bool/true/false, etc).

Because of this, it would be better to implement the text-to-speech processing phases in the Cainteoir engine, designed the way I want it (separate layers, individually tested, document reader event processing, independant language and voice, phoneme morphology a separate phase, etc.). Support for the espeak language (dictionary) and voice (phoneme) files can be provided in this architecture.

MBROLA voices can be properly supported as a set of voices on an external synthesizer with the correct phoneme compatibility map. The pthreads+fork interaction can also be handled correctly when stopping reading.

@rhdunn
Copy link
Owner Author

rhdunn commented Mar 15, 2013

The language/dictionary data support is now being tracked in issue #34 and the voice/phoneme support in issue #36.

@rhdunn rhdunn closed this as completed Mar 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant