first commit based on ggvad-genea2023

AI-Unicamp · Oct 21, 2024 · f6efad5 · f6efad5
1 parent fa8b9c8
commit f6efad5
Show file tree

Hide file tree

Showing 58 changed files with 11,590 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,135 @@
+# Removing datasets
+
+wavlm/*.pt
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+generate_wavlm_reps.ipynb
+generate_wavlm_reps.ipynb
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,28 @@
+FROM nvidia/cuda:12.2.0-devel-ubuntu22.04
+
+
+ENV PATH="/root/miniconda3/bin:${PATH}"
+ARG PATH="/root/miniconda3/bin:${PATH}"
+
+RUN apt-get update
+RUN apt-get install -y wget git nano ffmpeg
+
+RUN wget \
+    https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh \
+    && mkdir /root/.conda \
+    && bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b \
+    && rm -f Miniconda3-py37_4.8.3-Linux-x86_64.sh
+
+RUN conda --version
+
+WORKDIR /root
+COPY environment.yml /root
+
+RUN conda install tqdm -f
+RUN conda update conda
+RUN conda install pip
+RUN conda --version
+RUN conda env create -f environment.yml
+
+SHELL ["conda", "run", "-n", "ggvad", "/bin/bash", "-c"]
+RUN pip install git+https://github.com/openai/CLIP.git
diff --git a/README.md b/README.md
@@ -1,2 +1,99 @@
 # stylistic-gesture
-Stylistic Co-Speech Gesture Generation: Modeling Personality and Communicative Styles in Virtual Agents
+Official repository for the paper Stylistic Co-Speech Gesture Generation: Modeling Personality and Communicative Styles in Virtual Agents.
+
+## Preparing environment
+
+1. Git clone this repo
+
+2. Enter the repo and create docker image using 
+
+```sh
+docker build -t ggvad .
+```
+
+3. Run container using
+
+```sh
+docker run --rm -it --gpus device=GPU_NUMBER --userns=host --shm-size 64G -v /MY_DIR/ggvad-genea2023:/workspace/ggvad/ -p PORT_NUMBR --name CONTAINER_NAME ggvad:latest /bin/bash
+```
+
+for example:
+```sh
+docker run --rm -it --gpus device=0 --userns=host --shm-size 64G -v C:\ProgramFiles\ggvad-genea2023:/workspace/my_repo -p '8888:8888' --name my_container ggvad:latest /bin/bash
+```
+
+> ### Cuda version < 12.0:
+> 
+> If you have a previous cuda or nvcc release version you will need to adjust the Dockerfile. Change the first line to `FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel` and remove lines 10-14 (conda is already installed in the pythorch image). Then, run container using:
+> 
+> ```sh
+> nvidia-docker run --rm -it -e NVIDIA_VISIBLE_DEVICES=GPU_NUMBER --runtime=nvidia --userns=host --shm-size 64G -v /work/rodolfo.tonoli/GestureDiffusion:/workspace/gesture-diffusion/ -p $port --name gestdiff_container$number multimodal-research-group-mdm:latest /bin/bash
+> ```
+
+
+OR use the shell script ggvad_container.sh (don't forget to change the volume) using the flags -g, -n, and -p
+example:
+```sh
+sh ggvad_container.sh -g 0 -n my_container -p '8888:8888'
+```
+
+4. Activate cuda environment:
+```sh
+source activate ggvad
+```
+
+## Data pre-processing
+
+1. Get the GENEA Challenge 2023 dataset and put it into `./dataset/`
+(Our system is monadic so you'll only need the main-agent's data)
+
+2. Download the [WavLM Base +](https://github.com/microsoft/unilm/tree/master/wavlm) and put it into the folder `/wavlm/`
+
+3. Inside the folder `/workspace/ggvad`, run
+
+```sh
+python -m data_loaders.gesture.scripts.genea_prep
+```
+
+This will convert the bvh files to npy representations, downsample wav files to 16k and save them as npy arrays, and convert these arrays to wavlm representations. The VAD data must be processed separetely due to python libraries incompatibility. 
+
+4. (Optional) Process VAD data
+
+We provide the speech activity information (from speechbrain's VAD) data, but if you wish to process them yourself you should redo the steps of "Preparing environment" as before, but for the speechbrain environment: Build the image using the Dockerfile inside speechbrain (`docker build -t speechbrain .`), run the container (`docker run ... --name CONTAINER_NAME speechbrain:latest /bin/bash`) and run:
+
+```sh
+python -m data_loaders.gesture.scripts.genea_prep_vad
+```
+
+## Train model
+
+To train the model described in the paper use the following command inside the repo:
+
+```sh
+python -m train.train_mdm --save_dir save/my_model_run --dataset genea2023+ --step 10  --use_text --use_vad True --use_wavlm True
+```
+
+## Gesture Generation
+
+Generate motion using the trained model by running the following command. If you wish to generate gestures with the pretrained model of the Genea Challenge, use `--model_path ./save/default_vad_wavlm/model000290000.pt` 
+
+```sh
+python -m sample.generate --model_path ./save/my_model_run/model000XXXXXX.pt 
+```
+
+## Render
+
+To render the official Genea 2023 visualizations follow the instructions provided [here](https://github.com/TeoNikolov/genea_visualizer/)
+
+## Cite
+
+If you with to cite this repo or the paper
+
+```text
+@article{tonoli2024stylistic,
+  author    = {Tonoli, Rodolfo L. and Costa, Paula D. P.},
+  title     = {Stylistic Co-Speech Gesture Generation: Modeling Personality and Communicative Styles in Virtual Agents},
+  journal   = {N/A},
+  year      = {N/A},
+}
+```
diff --git a/data_loaders/__init__.py b/data_loaders/__init__.py
diff --git a/data_loaders/gesture/data/__init__.py b/data_loaders/gesture/data/__init__.py