Skip to content

Intermediate Speech Representations for LibriSpeech

Compare
Choose a tag to compare
@SarinaMeyer SarinaMeyer released this 14 Mar 14:11
c51e64a

This release contains the intermediate representations of linguistic content (phonetic transcription), prosody (pitch, energy, duration), and speaker embedding (GST, trained jointly with TTS) of the pipeline for the LibriSpeech train-clean-360, dev and test data of the VPC 2024. You can significantly reduce the run time of the pipeline by using these precomputed representations instead of computing them from scratch.