Skip to content

Releases: DigitalPhonetics/IMS-Toucan

Support all Types of Languages

20 May 10:04
1ae0202
Compare
Choose a tag to compare

This release extends the toolkits functionality and provides new checkpoints.

New Features:

  • support for all phonemes in the IPA standard through an extended lookup of articulatory features
  • support for some suprasegmental markers in the IPA standard through parsing (tone, lengthening, primary stress)
  • praat-parselmouth for greatly improved pitch extraction
  • faster phonemizaton
  • word boundaries are added, which are invisible to the aligner and the decoder, but can help the encoder in multilingual scenarios
  • tonal languages added, tested and included into the pretraining (Chinese, Vietnamese)
  • Scorer class to inspect data given a trained model and dataset cache (provided pretrained models can be used for this)
  • intuitive controls for scaling durations and variance in pitch and energy
  • divese bugfixes and speed increases

Note:

  • This release breaks backwards compatibility. Make sure you are using the associated pretrained models. Old checkpoints and dataset caches become incompatible. Only HiFiGAN remains compatible.
  • Work on upcoming releases is already in progress. Improved voice adaptation will be our next goal.
  • To use the pretrained checkpoints, download them, create their corresponding directories and place them into your clone as follows (you have to rename the HiFiGAN and FastSpeech2 checkpoints once in place):
...
Models
└─ Aligner
      └─ aligner.pt
└─ FastSpeech2_Meta
      └─ best.pt
└─ HiFiGAN_combined
      └─ best.pt
...

Multi Language and Multi Speaker

01 Mar 20:37
81075a6
Compare
Choose a tag to compare
  • self contained aligner to get high quality durations quickly and easily without reliance on external tools or knowledge distillation
  • modelling speakers and languages jointly but disentangled, so you can use speakers across languages
  • look at the demo section for an interactive online demo

Pretrained FastSpeech2 model that can speak in many languages in any voices, HiFiGAN model and Aligner model are attached to this commit.

Articulatory Features and LAML

28 Feb 20:36
6f180c7
Compare
Choose a tag to compare

This release includes our new text frontend that uses articulatory features of phonemes instead of phoneme identities as well as checkpoints trained with a variant of model agnostic meta learning that are very well suited as basis for fine-tuning a single speaker model on very little data in lots of different languages.

Tacotron2 FastSpeech2 HiFiGAN basic implementation complete

14 Jan 16:49
17d3dda
Compare
Choose a tag to compare

The basic version of Tacotron 2, FastSpeech 2 and HiFiGAN are complete. A pretrained model for HiFiGAN is attached to this release.

Future updates will include different models and new features and changes to existing models which will break backwards compatibility. This version is the most basic, but complete.