Unsupervised Generative Video Dubbing

Author: Jimin Tan, Chenqin Yang, Yakun Wang, Yash Deshpande

Project Website: https://tanjimin.github.io/unsupervised-video-dubbing/

Training Code for the dubbing model is under the root directory. We used a pre-processed LRW for training. See data.py for details.

We created a simple depolyment pipeline which can be find under post_processing subdirectory. The pipeline takes the model weights we pre-trained on LRW. The pipeline takes a video and a equal duration audio segments and output a dubbed video based on audio information. See the instruction below for more details.

Requirement

LibROSA 0.7.2
dlib 19.19
OpenCV 4.2.0
Pillow 6.2.2
PyTorch 1.2.0
TorchVision 0.4.0

Post-Procesing Folder

.
├── source                  
│   ├── audio_driver_mp4    # contain audio drivers (saved in mp4 format)
│   ├── audio_driver_wav    # contain audio drivers (saved in wav format)
│   ├── base_video          # contain base videos (videos you'd like to modify)
│   ├── dlib            		# trained dlib models
│   └── model               # trained landmark generation models
├── main.py									# main function for post processing
├── main_support.py					# support functions used in main.py
├── models.py								# define the landmark generation model
├── step_3_vid2vid.sh		  	# Bash script for running vid2vid
├── step_4_denoise.sh.      # Bash script for denoising vid2vid results
├── compare_openness.ipynb  # mouth openness comparison across generated videos
└── README.md

shape_predictor_68_face_landmarks.dat

This is trained on the ibug 300-W dataset (https://ibug.doc.ic.ac.uk/resources/facial-point-annotations/)

The license for this dataset excludes commercial use and Stefanos Zafeiriou, one of the creators of the dataset, asked me to include a note here saying that the trained model therefore can't be used in a commerical product. So you should contact a lawyer or talk to Imperial College London to find out if it's OK for you to use this model in a commercial product.

{C. Sagonas, E. Antonakos, G, Tzimiropoulos, S. Zafeiriou, M. Pantic. 300 faces In-the-wild challenge: Database and results. Image and Vision Computing (IMAVIS), Special Issue on Facial Landmark Localisation "In-The-Wild". 2016.}

Detailed steps for model deployment

Go to post_processing directory
Run: python3 main.py -r step (corresponding step below)
- e.g: python3 main.py -r 1 will run the first step and etc

Step 1 — generate landmarks

Input
- Base video file path (./source/base_video/base_video.mp4)
- Audio driver file path (./source/audio_driver_wav/audio_driver.wav)
- Epoch (int)
Output (./result)
- keypoints.npy (# generated landmarks in npy format)
- source.txt (contains information about base video, audio driver, model epoch)
Process
- Extract facial landmarks from base video
- Extract MFCC features from driver audio
- Pass MFCC features and facial landmarks into the model to retrieve mouth landmarks
- Combine facial & mouth landmarks and save in npy format

Step 2 — Test generated frames

Input
- None
Output (./result)
- Folder — save_keypoints: visualized generated frames
- Folder — save_keypoints_csv : landmark coordinates for each frame, saved in txt format
- openness.png: mouth openness measured and plotted across all frames
Process
- Generate images from npy file
- Generate openness plot

Step 3 — Execute vid2vid

Input
- None
Output
- Path for generated fake images from vid2vid are shown at the end; Please copy it back to the /result/vid2vid_frames/
  - Folder: vid2vid generated images
Process
- Run vid2vid
- Copy back vid2vid results to main folder

Step 4 — Denoise and smooth vid2vid results

Input
- vid2vid generated images folder path
- Original base images folder path
Output
- Folder: Modified images (base image + vid2vid mouth regions)
- Folder: Denoised and smoothed frames
Process
- Crop mouth areas from vid2vid generated images and paste them back to base images —> modified image
- Generate circular smoothed images by using gradient masking
- Take (modified image, circular smoothed images) as pairs and do denoising

Step 5 — Generate modified videos with sound

Input
- Saved frames folder path
  - By default, it is saved in ./result/save_keypoints; you can enter d to go with default path
  - Otherwise, input the frames folder path
- Audio driver file path (./source/audio_driver_wav/audio_driver.wav)
Output (./result/save_keypoints/result/)
- video_without_sound.mp4: modified videos without sound
- audio_only.mp4: audio driver
- final_output.mp4: modified videos with sound
Process
- Generate the modified video without sound with define fps
- Extract wav from audio driver
- Combine audio and video to generate final output

Important Notice

You may need to modify how MFCC features are extracted in extract_mfcc function
- Be careful about sample rate, window_length, hop_length
- Good resource: https://www.mathworks.com/help/audio/ref/mfcc.html
You may need to modify the region of interest (mouth area) in frame_crop function
You may need to modify the frame rate defined in step_3 of the main.py, which should be your base video fps

# How to check your base video fps
# source: https://www.learnopencv.com/how-to-find-frame-rate-or-frames-per-second-fps-in-opencv-python-cpp/

import cv2
video = cv2.VideoCapture("video.mp4");

# Find OpenCV version
(major_ver, minor_ver, subminor_ver) = (cv2.__version__).split('.')
if int(major_ver)  < 3 :
    fps = video.get(cv2.cv.CV_CAP_PROP_FPS)
    print("Frames per second using video.get(cv2.cv.CV_CAP_PROP_FPS): {0}".format(fps))
else :
    fps = video.get(cv2.CAP_PROP_FPS)
    print("Frames per second using video.get(cv2.CAP_PROP_FPS) : {0}".format(fps))
video.release()

You may need to modify the shell path

echo $SHELL

You may need to modify the audio sampling rate in extract_audio function
You may need to customize your parameters in combine_audio_video function
- Good resource: https://ffmpeg.org/ffmpeg.html
- https://gist.github.com/tayvano/6e2d456a9897f55025e25035478a3a50

Update History

March 22, 2020: Drafted documentation

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
data		data
deployment		deployment
docs		docs
post_processing		post_processing
validation		validation
.gitignore		.gitignore
README.md		README.md
data.py		data.py
eval.py		eval.py
main.py		main.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Generative Video Dubbing

Requirement

Post-Procesing Folder

Detailed steps for model deployment

Step 1 — generate landmarks

Step 2 — Test generated frames

Step 3 — Execute vid2vid

Step 4 — Denoise and smooth vid2vid results

Step 5 — Generate modified videos with sound

Important Notice

Update History

About

Releases

Packages

Contributors 3

Languages

tanjimin/unsupervised-video-dubbing

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Generative Video Dubbing

Requirement

Post-Procesing Folder

Detailed steps for model deployment

Step 1 — generate landmarks

Step 2 — Test generated frames

Step 3 — Execute vid2vid

Step 4 — Denoise and smooth vid2vid results

Step 5 — Generate modified videos with sound

Important Notice

Update History

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages