Skip to content

lipsync-mediapipe is a project that learns facial landmark shapes from speech audio using Python 3.8.

Notifications You must be signed in to change notification settings

WillReynolds5/lipsync-mediapipe

Repository files navigation

👄 lipsync-mediapipe 🗣️

lipsync-mediapipe is a project that learns facial landmark shapes from speech audio using Python 3.8. The project uses an innovative approach by mapping melspectrogram slices into 1D arrays of facial landmarks structured as [x1, y1, z1, x2, y2, z2... etc].


💻 Setup Environment

  1. Ensure your environment is set up with Python 3.8.
  2. Install the correct PyTorch version from this link.
  3. Run the following command in your terminal to install the required Python packages:
    pip install -r requirements.txt

🗄️ Build Dataset

Import a .mov or .mp4 video of a talking head. Ideally, the video should be longer than 10 minutes. You can use the following command line to start building the dataset:

python dataset.py 'example.mov' 'dataset_name'

Replace 'example.mov' with the path to your video file and 'dataset_name' with the desired name for your dataset.

🚀 Train Model

Use the following command line to train the model:

python train.py 'dataset_name' --batch_size=32 --epochs=30

Replace 'dataset_name' with the name of your dataset. You can adjust the batch size and the number of epochs as needed.

📝 Note

The default batch size is 32, and the default number of epochs is 30. You can adjust these as per your computational capabilities and dataset size.

About

lipsync-mediapipe is a project that learns facial landmark shapes from speech audio using Python 3.8.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages