first stylistic-gesture model commit

AI-Unicamp · Oct 21, 2024 · d6c0534 · d6c0534
1 parent f6efad5
commit d6c0534
Show file tree

Hide file tree

Showing 33 changed files with 1,318 additions and 1,297 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -1,18 +1,15 @@
-FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
+FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
 
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
+
+RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub
 
 ENV PATH="/root/miniconda3/bin:${PATH}"
 ARG PATH="/root/miniconda3/bin:${PATH}"
 
 RUN apt-get update
 RUN apt-get install -y wget git nano ffmpeg
 
-RUN wget \
-    https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
-    && mkdir /root/.conda \
-    && bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b \
-    && rm -f Miniconda3-py37_4.8.3-Linux-x86_64.sh
-
 RUN conda --version
 
 WORKDIR /root
@@ -24,5 +21,11 @@ RUN conda install pip
 RUN conda --version
 RUN conda env create -f environment.yml
 
-SHELL ["conda", "run", "-n", "ggvad", "/bin/bash", "-c"]
-RUN pip install git+https://github.com/openai/CLIP.git
+SHELL ["conda", "run", "-n", "stylistic-env", "/bin/bash", "-c"]
+RUN python -m spacy download en_core_web_sm
+RUN pip install blobfile
+RUN pip install PyYAML
+RUN pip install librosa
+RUN pip install python_speech_features
+RUN pip install einops
+RUN pip install wandb
diff --git a/README.md b/README.md
@@ -8,82 +8,66 @@ Official repository for the paper Stylistic Co-Speech Gesture Generation: Modeli
 2. Enter the repo and create docker image using 
 
 ```sh
-docker build -t ggvad .
+docker build -t stylistic-gesture .
 ```
 
 3. Run container using
 
 ```sh
-docker run --rm -it --gpus device=GPU_NUMBER --userns=host --shm-size 64G -v /MY_DIR/ggvad-genea2023:/workspace/ggvad/ -p PORT_NUMBR --name CONTAINER_NAME ggvad:latest /bin/bash
+nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES={GPU} --runtime=nvidia --userns=host --shm-size 64G -v {LOCAL_DIR}:{CONTAINER_DIR} -p {PORT} --name {CONTAINER_NAME} stylistic-gesture:latest /bin/bash
 ```
 
 for example:
 ```sh
-docker run --rm -it --gpus device=0 --userns=host --shm-size 64G -v C:\ProgramFiles\ggvad-genea2023:/workspace/my_repo -p '8888:8888' --name my_container ggvad:latest /bin/bash
-```
-
-> ### Cuda version < 12.0:
-> 
-> If you have a previous cuda or nvcc release version you will need to adjust the Dockerfile. Change the first line to `FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel` and remove lines 10-14 (conda is already installed in the pythorch image). Then, run container using:
-> 
-> ```sh
-> nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=GPU_NUMBER --runtime=nvidia --userns=host --shm-size 64G -v /work/rodolfo.tonoli/GestureDiffusion:/workspace/gesture-diffusion/ -p $port --name gestdiff_container$number multimodal-research-group-mdm:latest /bin/bash
-> ```
-
-
-OR use the shell script ggvad_container.sh (don't forget to change the volume) using the flags -g, -n, and -p
-example:
-```sh
-sh ggvad_container.sh -g 0 -n my_container -p '8888:8888'
+docker run --rm -it --gpus device=0 --userns=host --shm-size 64G -v C:\ProgramFiles\stylistic-gesture:/workspace/stylistic-gesture -p '8888:8888' --name stylistic-gesture-container stylistic-gesture:latest /bin/bash
 ```
 
 4. Activate cuda environment:
 ```sh
-source activate ggvad
+source activate stylistic-env
 ```
 
 ## Data pre-processing
 
-1. Get the GENEA Challenge 2023 dataset and put it into `./dataset/`
-(Our system is monadic so you'll only need the main-agent's data)
+1. Get the BRG-Unicamp dataset following the instructions from [here](https://ai-unicamp.github.io/BRG-Unicamp/) and put it into `./dataset/`
 
 2. Download the [WavLM Base +](https://github.com/microsoft/unilm/tree/master/wavlm) and put it into the folder `/wavlm/`
 
-3. Inside the folder `/workspace/ggvad`, run
+3. In the container with the active environment, enter the folder `/workspace/stylistic-gesture`, run
 
 ```sh
-python -m data_loaders.gesture.scripts.genea_prep
+python -m data_loaders.gesture.scripts.ptbrgesture_prep
 ```
 
 This will convert the bvh files to npy representations, downsample wav files to 16k and save them as npy arrays, and convert these arrays to wavlm representations. The VAD data must be processed separetely due to python libraries incompatibility. 
 
 4. (Optional) Process VAD data
 
-We provide the speech activity information (from speechbrain's VAD) data, but if you wish to process them yourself you should redo the steps of "Preparing environment" as before, but for the speechbrain environment: Build the image using the Dockerfile inside speechbrain (`docker build -t speechbrain .`), run the container (`docker run ... --name CONTAINER_NAME speechbrain:latest /bin/bash`) and run:
+BRG-Unicamp provides the speech activity information (from speechbrain's VAD) data, but if you wish to process them yourself you should redo the steps of "Preparing environment" as before, but for the speechbrain environment: Build the image using the Dockerfile inside speechbrain (`docker build -t speechbrain .`), run the container (`docker run ... --name CONTAINER_NAME speechbrain:latest /bin/bash`) and run:
 
 ```sh
-python -m data_loaders.gesture.scripts.genea_prep_vad
+python -m data_loaders.gesture.scripts.ptbrgesture_prep_vad
 ```
 
 ## Train model
 
 To train the model described in the paper use the following command inside the repo:
 
 ```sh
-python -m train.train_mdm --save_dir save/my_model_run --dataset genea2023+ --step 10  --use_text --use_vad True --use_wavlm True
+python -m train.train_mdm --save_dir save/my_model_run --dataset ptbr --step 10  --use_vad True --use_wavlm True --use_style_enc True
 ```
 
 ## Gesture Generation
 
-Generate motion using the trained model by running the following command. If you wish to generate gestures with the pretrained model of the Genea Challenge, use `--model_path ./save/default_vad_wavlm/model000290000.pt` 
+Generate motion using the trained model by running the following command. If you wish to generate gestures with the pretrained model of the Genea Challenge, use `--model_path ./save/stylistic-gesture/model000600000.pt` 
 
 ```sh
-python -m sample.generate --model_path ./save/my_model_run/model000XXXXXX.pt 
+python -m sample.ptbrgenerate --model_path ./save/my_model_run/model000XXXXXX.pt 
 ```
 
 ## Render
 
-To render the official Genea 2023 visualizations follow the instructions provided [here](https://github.com/TeoNikolov/genea_visualizer/)
+In our perceptual evaluation, we used the render procedure from the official GENEA Challenge 2023 visualizations. Instructions provided [here](https://github.com/TeoNikolov/genea_visualizer/)
 
 ## Cite