Large-scale multilingual audio visual dubbing, Yang et al., 2020

Paper, Tags: #audio

Our system translates videos from one language to another. The source language's speech content is transcribed to text, translated, and automatically synthesized into target language speech using the original speaker's voice. The visual content is translated by synthesizing lip movements for the speaker to match the translated audio.

We collected a large multilingual dataset and used it to train a large multilingual multi-speaker lipsync model. We perform speaker-specific fine-tuning using data from each individual target speaker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2011.03530.md

2011.03530.md

Large-scale multilingual audio visual dubbing, Yang et al., 2020

Paper, Tags: #audio

Files

2011.03530.md

Latest commit

History

2011.03530.md

File metadata and controls

Large-scale multilingual audio visual dubbing, Yang et al., 2020

Paper, Tags: #audio