Intermediate Speech Representations for LibriSpeech
This release contains the intermediate representations of linguistic content (phonetic transcription), prosody (pitch, energy, duration), and speaker embedding (GST, trained jointly with TTS) of the pipeline for the LibriSpeech train-clean-360, dev and test data of the VPC 2024. You can significantly reduce the run time of the pipeline by using these precomputed representations instead of computing them from scratch.