Skip to content

Commit

Permalink
Merge pull request #3 from eltociear/patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
pawel-polyai authored Jan 9, 2024
2 parents 38cc88d + dbf3752 commit ba0376e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This repo contains recipes and models used for training Pheme TTS models. It is
Our Pheme TTS framework validates several hypotheses:
1. We can train Transformer-based conversational TTS models with much fewer training data than e.g., VALL-E or SoundStorm (e.g., 10x fewer data).
2. Training can be performed with conversational, podcast, and noisy data like GigaSpeech.
3. Efficiency is paramount, which includes parameter efficiency (compact models), data efficiency (fewer training data) and inference effiency (reduced latency).
3. Efficiency is paramount, which includes parameter efficiency (compact models), data efficiency (fewer training data) and inference efficiency (reduced latency).
4. One fundamental ingredient is the separation of semantic and acoustic tokens and the adequate speech tokenizer.
5. Inference can be run parallelly through MaskGit-style inference with 15x speed-ups compared to similarly sized autoregressive models.
6. The single-speaker quality can be improved through student-teacher training with (synthetic) data generated by third-party providers.
Expand Down

0 comments on commit ba0376e

Please sign in to comment.