PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This fork has been instrumented with Weights & Biases to enable experiment tracking, prediction logging, dataset and model versioning, and hyperparameter optimziation.
This implementation includes uses the LJSpeech dataset.
- NVIDIA GPU + CUDA cuDNN
- Run
pip install -r requirements.txt
- Run
wandb init
to configure your working directory to log to Weights & Biases. - Run
python register-data.py
to create a reference Artifact pointing to the LJSpeech dataset. - Run
python split-data.py
to create a versioned train/validation split of the data. - Run
python register-model ...
to log pre-trained tacotron and waveglow models as Artifacts to Weights & Biases. - Run
python train.py <dataset-artifact>
to warm-start train tacotron2 on the dataset you created. - Run
python inference.py <tacotron-artifact> <waveglow-artifact> <text>
to run inference on a text file containing newline delimited sentences. The inference results will be logged to Weights & Biases as awandb.Table
WaveGlow Faster than real time Flow-based Generative Network for Speech Synthesis
nv-wavenet Faster than real time WaveNet.
This implementation uses code from the following repos: Keith Ito, Prem Seetharaman as described in our code.
We are inspired by Ryuchi Yamamoto's Tacotron PyTorch implementation.
We are thankful to the Tacotron 2 paper authors, specially Jonathan Shen, Yuxuan Wang and Zongheng Yang.