lipsync-mediapipe is a project that learns facial landmark shapes from speech audio using Python 3.8. The project uses an innovative approach by mapping melspectrogram slices into 1D arrays of facial landmarks structured as [x1, y1, z1, x2, y2, z2... etc].
- Ensure your environment is set up with Python 3.8.
- Install the correct PyTorch version from this link.
- Run the following command in your terminal to install the required Python packages:
pip install -r requirements.txt
Import a .mov or .mp4 video of a talking head. Ideally, the video should be longer than 10 minutes. You can use the following command line to start building the dataset:
python dataset.py 'example.mov' 'dataset_name'
Replace 'example.mov' with the path to your video file and 'dataset_name' with the desired name for your dataset.
Use the following command line to train the model:
python train.py 'dataset_name' --batch_size=32 --epochs=30
Replace 'dataset_name' with the name of your dataset. You can adjust the batch size and the number of epochs as needed.
The default batch size is 32, and the default number of epochs is 30. You can adjust these as per your computational capabilities and dataset size.