Skip to content

Latest commit

 

History

History
executable file
·
48 lines (37 loc) · 2.33 KB

README.md

File metadata and controls

executable file
·
48 lines (37 loc) · 2.33 KB

Tacotron 2 + Weights & Biases

PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This fork has been instrumented with Weights & Biases to enable experiment tracking, prediction logging, dataset and model versioning, and hyperparameter optimziation.

This implementation includes uses the LJSpeech dataset.

Pre-requisites

  1. NVIDIA GPU + CUDA cuDNN

Running

  1. Run pip install -r requirements.txt
  2. Run wandb init to configure your working directory to log to Weights & Biases.
  3. Run python register-data.py to create a reference Artifact pointing to the LJSpeech dataset.
  4. Run python split-data.py to create a versioned train/validation split of the data.
  5. Run python register-model ... to log pre-trained tacotron and waveglow models as Artifacts to Weights & Biases.
  6. Run python train.py <dataset-artifact> to warm-start train tacotron2 on the dataset you created.
  7. Run python inference.py <tacotron-artifact> <waveglow-artifact> <text> to run inference on a text file containing newline delimited sentences. The inference results will be logged to Weights & Biases as a wandb.Table

Related repos

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis

nv-wavenet Faster than real time WaveNet.

Acknowledgements (Copied)

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.

We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.

We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.