Tacotron 2 + Weights & Biases

PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This fork has been instrumented with Weights & Biases to enable experiment tracking, prediction logging, dataset and model versioning, and hyperparameter optimziation.

This implementation includes uses the LJSpeech dataset.

Pre-requisites

Run pip install -r requirements.txt
Run wandb init to configure your working directory to log to Weights & Biases.
Run python register-data.py to create a reference Artifact pointing to the LJSpeech dataset.
Run python split-data.py to create a versioned train/validation split of the data.
Run python register-model ... to log pre-trained tacotron and waveglow models as Artifacts to Weights & Biases.
Run python train.py <dataset-artifact> to warm-start train tacotron2 on the dataset you created.
Run python inference.py <tacotron-artifact> <waveglow-artifact> <text> to run inference on a text file containing newline delimited sentences. The inference results will be logged to Weights & Biases as a wandb.Table

WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis

nv-wavenet Faster than real time WaveNet.

This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.

We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.

We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.