Skip to content

vincenzo-scotti/ITAcotron_2

 
 

Repository files navigation

ITAcotron 2

Codebase for the papers "ITAcotron 2: the Power of Transfer Learning in Expressive TTS Synthesis" and "ITAcotron 2: Transfering English Speech Synthesis Architectures and Speech Features to Italian". For all the references, contributions and credits, please refer to the papers.

This code was originally developed as part of the M.Sc. Thesis in Cognitive Science "Conditional Text to Speech by Means of Transfer Learning". The M.Sc. degree was released by the Center for Mind/Brain Sciences (CIMeC) of the Università degli Studi di Trento (UniTn). The Thesis was supervised at Politecnico di Milano (PoliMI) by the staff of the ARCSlab.

Usage

To generate Italian clips, you can use the notebook at the following path:
notebooks/ITAcotron-2_synthesis.ipynb

Model weights

Link to download the weights of the trained models:

  • Tacotron 2 [ link ] (trained on Italian data)
  • FB-MelGAN Vocoder [ link ] and vocoder configuration file [ link ] (taken from the original repo, trained on English data)
  • Speaker Encoder [ link ] and speaker encoder configuration file [ link ] (taken from the original repo, trained on English data)

Changes from origin

The code in this repository is based on a fork of the Mozilla TTS repository. Please refer to the source for the documentation.

With respect to the original implementation, we modified the following files:

  • TTS/tts/datasets/preprocess.py
  • TTS/tts/datasets/TTSDataset.py
  • TTS/tts/utils/text/__init__.py
  • TTS/tts/utils/text/cleaners.py
  • TTS/tts/utils/text/symbols.py
  • TTS/bin/train_tacotron.py

The code was taken from this commit.

Added configurations

Configuration files added for the training of Italian TTS:

  • TTS/tts/configs/config_first_finetuning.json
  • TTS/tts/configs/config_second_finetuning.json

Cite work

If you are willing to use our code, please cite our work through the following BibTeX entries:

@inproceedings{favaro-etal-2021-itacotron,
	Address = {Trento, Italy},
	Author = {Favaro, Anna and Sbattella, Licia and Tedesco, Roberto and Scotti, Vincenzo},
	Booktitle = {Proceedings of The Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)},
	Month = {12--13 } # nov,
	Pages = {83--88},
	Publisher = {Association for Computational Linguistics},
	Title = {{ITA}cotron 2: Transfering {E}nglish Speech Synthesis Architectures and Speech Features to {I}talian},
	Url = {https://aclanthology.org/2021.icnlsp-1.10},
	Year = {2021}
}

@inbook{favaro-etal-2022-itacotron,
	Author={Favaro, Anna  and Sbattella, Licia  and Tedesco, Roberto  and Scotti, Vincenzo},
	Editor={Abbas, Mourad},
	Title={ITAcotron 2: the Power of Transfer Learning in Expressive TTS Synthesis},
	BookTitle={Analysis and Application of Natural Language and Speech Processing},
	Year={2022},
	Publisher={Springer International Publishing},
	Address={Cham},
	Pages={1--20},
	Isbn={978-3-031-11034-4},
	Doi={10.1007/978-3-031-11035-1\_1},
	Url={https://doi.org/10.1007/978-3-031-11035-1\_1}
}

Acknowledgments

We wish to thank all the contributors to the "TTS: Text-to-Speech for all" repository for their help.

About

🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 89.0%
  • Python 10.8%
  • Other 0.2%