Skip to content

Commit

Permalink
first stylistic-gesture model commit
Browse files Browse the repository at this point in the history
  • Loading branch information
rltonoli committed Oct 21, 2024
1 parent f6efad5 commit d6c0534
Show file tree
Hide file tree
Showing 33 changed files with 1,318 additions and 1,297 deletions.
21 changes: 12 additions & 9 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,18 +1,15 @@
FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel

RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub

RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"

RUN apt-get update
RUN apt-get install -y wget git nano ffmpeg

RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b \
&& rm -f Miniconda3-py37_4.8.3-Linux-x86_64.sh

RUN conda --version

WORKDIR /root
Expand All @@ -24,5 +21,11 @@ RUN conda install pip
RUN conda --version
RUN conda env create -f environment.yml

SHELL ["conda", "run", "-n", "ggvad", "/bin/bash", "-c"]
RUN pip install git+https://github.com/openai/CLIP.git
SHELL ["conda", "run", "-n", "stylistic-env", "/bin/bash", "-c"]
RUN python -m spacy download en_core_web_sm
RUN pip install blobfile
RUN pip install PyYAML
RUN pip install librosa
RUN pip install python_speech_features
RUN pip install einops
RUN pip install wandb
42 changes: 13 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,82 +8,66 @@ Official repository for the paper Stylistic Co-Speech Gesture Generation: Modeli
2. Enter the repo and create docker image using

```sh
docker build -t ggvad .
docker build -t stylistic-gesture .
```

3. Run container using

```sh
docker run --rm -it --gpus device=GPU_NUMBER --userns=host --shm-size 64G -v /MY_DIR/ggvad-genea2023:/workspace/ggvad/ -p PORT_NUMBR --name CONTAINER_NAME ggvad:latest /bin/bash
nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES={GPU} --runtime=nvidia --userns=host --shm-size 64G -v {LOCAL_DIR}:{CONTAINER_DIR} -p {PORT} --name {CONTAINER_NAME} stylistic-gesture:latest /bin/bash
```

for example:
```sh
docker run --rm -it --gpus device=0 --userns=host --shm-size 64G -v C:\ProgramFiles\ggvad-genea2023:/workspace/my_repo -p '8888:8888' --name my_container ggvad:latest /bin/bash
```

> ### Cuda version < 12.0:
>
> If you have a previous cuda or nvcc release version you will need to adjust the Dockerfile. Change the first line to `FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel` and remove lines 10-14 (conda is already installed in the pythorch image). Then, run container using:
>
> ```sh
> nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=GPU_NUMBER --runtime=nvidia --userns=host --shm-size 64G -v /work/rodolfo.tonoli/GestureDiffusion:/workspace/gesture-diffusion/ -p $port --name gestdiff_container$number multimodal-research-group-mdm:latest /bin/bash
> ```
OR use the shell script ggvad_container.sh (don't forget to change the volume) using the flags -g, -n, and -p
example:
```sh
sh ggvad_container.sh -g 0 -n my_container -p '8888:8888'
docker run --rm -it --gpus device=0 --userns=host --shm-size 64G -v C:\ProgramFiles\stylistic-gesture:/workspace/stylistic-gesture -p '8888:8888' --name stylistic-gesture-container stylistic-gesture:latest /bin/bash
```

4. Activate cuda environment:
```sh
source activate ggvad
source activate stylistic-env
```

## Data pre-processing

1. Get the GENEA Challenge 2023 dataset and put it into `./dataset/`
(Our system is monadic so you'll only need the main-agent's data)
1. Get the BRG-Unicamp dataset following the instructions from [here](https://ai-unicamp.github.io/BRG-Unicamp/) and put it into `./dataset/`

2. Download the [WavLM Base +](https://github.com/microsoft/unilm/tree/master/wavlm) and put it into the folder `/wavlm/`

3. Inside the folder `/workspace/ggvad`, run
3. In the container with the active environment, enter the folder `/workspace/stylistic-gesture`, run

```sh
python -m data_loaders.gesture.scripts.genea_prep
python -m data_loaders.gesture.scripts.ptbrgesture_prep
```

This will convert the bvh files to npy representations, downsample wav files to 16k and save them as npy arrays, and convert these arrays to wavlm representations. The VAD data must be processed separetely due to python libraries incompatibility.

4. (Optional) Process VAD data

We provide the speech activity information (from speechbrain's VAD) data, but if you wish to process them yourself you should redo the steps of "Preparing environment" as before, but for the speechbrain environment: Build the image using the Dockerfile inside speechbrain (`docker build -t speechbrain .`), run the container (`docker run ... --name CONTAINER_NAME speechbrain:latest /bin/bash`) and run:
BRG-Unicamp provides the speech activity information (from speechbrain's VAD) data, but if you wish to process them yourself you should redo the steps of "Preparing environment" as before, but for the speechbrain environment: Build the image using the Dockerfile inside speechbrain (`docker build -t speechbrain .`), run the container (`docker run ... --name CONTAINER_NAME speechbrain:latest /bin/bash`) and run:

```sh
python -m data_loaders.gesture.scripts.genea_prep_vad
python -m data_loaders.gesture.scripts.ptbrgesture_prep_vad
```

## Train model

To train the model described in the paper use the following command inside the repo:

```sh
python -m train.train_mdm --save_dir save/my_model_run --dataset genea2023+ --step 10 --use_text --use_vad True --use_wavlm True
python -m train.train_mdm --save_dir save/my_model_run --dataset ptbr --step 10 --use_vad True --use_wavlm True --use_style_enc True
```

## Gesture Generation

Generate motion using the trained model by running the following command. If you wish to generate gestures with the pretrained model of the Genea Challenge, use `--model_path ./save/default_vad_wavlm/model000290000.pt`
Generate motion using the trained model by running the following command. If you wish to generate gestures with the pretrained model of the Genea Challenge, use `--model_path ./save/stylistic-gesture/model000600000.pt`

```sh
python -m sample.generate --model_path ./save/my_model_run/model000XXXXXX.pt
python -m sample.ptbrgenerate --model_path ./save/my_model_run/model000XXXXXX.pt
```

## Render

To render the official Genea 2023 visualizations follow the instructions provided [here](https://github.com/TeoNikolov/genea_visualizer/)
In our perceptual evaluation, we used the render procedure from the official GENEA Challenge 2023 visualizations. Instructions provided [here](https://github.com/TeoNikolov/genea_visualizer/)

## Cite

Expand Down
Loading

0 comments on commit d6c0534

Please sign in to comment.