Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 666 Bytes

2011.03530.md

File metadata and controls

7 lines (4 loc) · 666 Bytes

Large-scale multilingual audio visual dubbing, Yang et al., 2020

Paper, Tags: #audio

Our system translates videos from one language to another. The source language's speech content is transcribed to text, translated, and automatically synthesized into target language speech using the original speaker's voice. The visual content is translated by synthesizing lip movements for the speaker to match the translated audio.

We collected a large multilingual dataset and used it to train a large multilingual multi-speaker lipsync model. We perform speaker-specific fine-tuning using data from each individual target speaker.