Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

This repository is the official implementation of paper "Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance".

We introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation.

arXiv | Paper | Project

Abstract

Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses substantial challenges. These challenges stem primarily from (i) the absence of powerful generative models capable of jointly modeling natural language, 3D scenes, and human motion, and (ii) the generative models' intensive data requirements contrasted with the scarcity of comprehensive, high-quality, language-scene-motion datasets. To tackle these issues, we introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation. Our framework comprises an Affordance Diffusion Model (ADM) for predicting explicit affordance map and an Affordance-to-Motion Diffusion Model (AMDM) for generating plausible human motions. By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals, especially when training with limited data lacking extensive language-scene-motion pairs. Our extensive experiments demonstrate that our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE. Additionally, we validate our model's exceptional generalization capabilities on a specially curated evaluation set featuring previously unseen descriptions and scenes.

Environment Setup

Installation

Create a new conda env. and install pytorch with conda. Our pytorch version is 1.12.0 and cuda version is 11.3.

conda create -n afford python=3.8
conda activate afford
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch

Install other dependencies with pip.

pip install -r requirements.txt

Data Preparation

You can directly download our preprocessed data from OneDrive / Baidu Disk or following data preparation to preprocess the data by yourself. Download the data and put them in the data/ folder. Organize the folder as follows:

- afford-motion/
  - body_models/
    - ...
  - data/
    - ...
  - outputs/
    - ...
  - configs/
  - datasets/
  - ...

We also provide the pre-trained models on OneDrive / Baidu Disk. Download the models and put them in the outputs/ folder.

See more details about the folder structure in prepare/README.md.

Evaluation on HumanML3D

Train ADM

bash scripts/t2m_contact/train_ddp.sh ${EXP_NAME} ${PORT}
# or, bash scripts/t2m_contact/train.sh ${EXP_NAME} # for single GPU training
# e.g., bash scripts/t2m_contact/train_ddp.sh CDM-Perceiver-H3D 29500

EXP_NAME: the name of the experiment
PORT: the port number for parallel training
Our default setting is to train the model with multiple GPUs. You can use train.py to train your model with one GPU.

Train AMDM

bash scripts/t2m_contact_motion/train_ddp.sh ${EXP_NAME} ${PORT}
# or, bash scripts/t2m_contact_motion/train.sh ${EXP_NAME} # for single GPU training
# e.g., bash scripts/t2m_contact_motion/train_ddp.sh CMDM-Enc-H3D-mixtrain0.5 29500

the arguments are the same as above
Note that, please make sure the model can correctly load the pre-generated affordance maps. (You can insert a print to check if the code runs into Line 772 in dataset/humanml3d.py.)

Note: We call the model ADM/AMDM as CDM/CMDM in our early implementation, so you may find the model name is CDM/CMDM in the code.

Evaluate

1. Pre-generate affordance maps

bash scripts/t2m_contact/test.sh ${MODEL_DIR} ${EVAL_MODE} ${RAND_SEED}
# e.g., bash scripts/t2m_contact/test.sh ./outputs/CDM-Perceiver-H3D/ wo_mm 2023

MODEL_DIR: the directory of the checkpoint
EVAL_MODE: wo_mm for evaluation without MM metric, w_mm for evaluation with MM metric
RAND_SEED: random seed, can be null

2. Generate motion sequences

bash scripts/t2m_contact_motion/test.sh ${MODEL_DIR} ${AFFORD_DIR} ${EVAL_MODE} ${RAND_SEED}
# e.g., bash scripts/t2m_contact_motion/test.sh outputs/CMDM-Enc-H3D-mixtrain0.5/ outputs/CDM-Perceiver-H3D/eval/test-0413-205430 wo_mm 2023

AFFORD_DIR: the directory of the pre-generated affordance maps
other arguments are the same as above

3. Calculate metrics

We calculate the metrics based on the code of MDM, so you first need to clone the MDM repository and setup the environment (Note that, MDM generates the motion and then calculate the metrics with a single program; we first generate the motion using our code, then calculate the metrics using MDM code). After generating the motion following the above two steps, you should follow the next steps to calculate the metrics. Assume you have cloned the MDM repository to ${PATH}/motion-diffusion-model/.

a. copy the folder save/ in h3d_eval to ${PATH}/motion-diffusion-model/

b. copy eval_h3d_dataset_offline.py and eval_h3d_offline.py to ${PATH}/motion-diffusion-model/eval/

c. put the absolute path of generated motion folders in Line72 and Line73 in eval_h3d_offline.py, respectively (The folders contains one generated with wo_mm and another with w_mm. The folder with w_mm can be null string if you don't want to calculate the MM metric.)

d. run the following commands to calculate:

python -m eval.eval_h3d_offline --model ./save/cmdm_h3d/model --eval_mode mm_short

Evaluation on HUMANISE

Train ADM

bash scripts/ts2m_contact/train_ddp.sh ${EXP_NAME} ${PORT}
# or, bash scripts/ts2m_contact/train.sh ${EXP_NAME} # for single GPU training
# e.g., bash scripts/ts2m_contact/train_ddp.sh CDM-Perceiver-HUMANISE-step200k 29500

the arguments are the same as above

Train AMDM

bash scripts/ts2m_contact_motion/train_ddp.sh ${EXP_NAME} ${PORT}
# or, bash scripts/ts2m_contact_motion/train.sh ${EXP_NAME} # for single GPU training
# e.g., bash scripts/ts2m_contact_motion/train_ddp.sh CMDM-Enc-HUMANISE-step400k 29500

the arguments are the same as above

Evaluate

1. Pre-generate affordance maps

bash scripts/ts2m_contact/test.sh ${MODEL_DIR} ${RAND_SEED}
# e.g., bash scripts/ts2m_contact/test.sh ./outputs/CDM-Perceiver-HUMANISE-step200k/ 2023

the arguments are the same as above

2. Generate motion sequences

bash scripts/ts2m_contact_motion/test.sh ${MODEL_DIR} ${AFFORD_DIR} ${RAND_SEED}
# e.g., bash scripts/ts2m_contact_motion/test.sh outputs/CMDM-Enc-HUMANISE-step400k/ outputs/CDM-Perceiver-HUMANISE-step200k/eval/test-0415-214721/ 2023

the arguments are the same as above
The calculated metrics are stored in ${MODEl_DIR}/eval/${test-MMDD-HHMMSS}/metrics.txt

Evaluation on Novel Evaluation Set

Train ADM

bash scripts/novel_contact/train_ddp.sh ${EXP_NAME} ${PORT}
# or, bash scripts/novel_contact/train.sh ${EXP_NAME} # for single GPU training
# e.g., bash scripts/novel_contact/train_ddp.sh CDM-Perceiver-ALL 29500

the arguments are the same as above

Train AMDM

bash scripts/novel_contact_motion/train_ddp.sh ${EXP_NAME} ${PORT}
# or, bash scripts/novel_contact_motion/train.sh ${EXP_NAME} # for single GPU training
# e.g., bash scripts/novel_contact_motion/train_ddp.sh CMDM-Enc-ALL 29500

the arguments are the same as above

Evaluate

1. Pre-generate affordance maps using the novel evluation set

bash scripts/novel_contact/test.sh ${MODEL_DIR} ${RAND_SEED}
# e.g., bash scripts/novel_contact/test.sh outputs/CDM-Perceiver-ALL/ 2023

the arguments are the same as above

2. Generate motion sequences using the novel evluation set

bash scripts/novel_contact_motion/test.sh ${MODEL_DIR} ${AFFORD_DIR} ${RAND_SEED}
# e.g., bash scripts/novel_contact_motion/test.sh outputs/CMDM-Enc-ALL/ outputs/CDM-Perceiver-ALL/eval/test-0611-153206/ 2023

the arguments are the same as above
The calculated metrics are stored in ${MODEl_DIR}/eval/${test-MMDD-HHMMSS}/metrics.txt

Contact

If you have any questions, please feel free to contact me via email: [email protected].

Citation

If you find our project useful, please consider citing us:

@inproceedings{wang2024move,
  title={Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance},
  author={Wang, Zan and Chen, Yixin and Jia, Baoxiong and Li, Puhao and Zhang, Jinlu and Zhang, Jingze and Liu, Tengyu and Zhu, Yixin and Liang, Wei and Huang, Siyuan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Acknowledgement

Partial code is borrowed from MDM, HumanML3D and HUMANISE.

License

This project is licensed under the MIT License. See LICENSE for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Abstract

Environment Setup

Installation

Data Preparation

Evaluation on HumanML3D

Train ADM

Train AMDM

Evaluate

1. Pre-generate affordance maps

2. Generate motion sequences

3. Calculate metrics

Evaluation on HUMANISE

Train ADM

Train AMDM

Evaluate

1. Pre-generate affordance maps

2. Generate motion sequences

Evaluation on Novel Evaluation Set

Train ADM

Train AMDM

Evaluate

1. Pre-generate affordance maps using the novel evluation set

2. Generate motion sequences using the novel evluation set

Contact

Citation

Acknowledgement

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
configs		configs
datasets		datasets
diffusion		diffusion
h3d_eval		h3d_eval
models		models
prepare		prepare
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
train_ddp.py		train_ddp.py
visualize.py		visualize.py
visualize_h3d.py		visualize_h3d.py

License

bigai-ai/afford-motion

Folders and files

Latest commit

History

Repository files navigation

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Abstract

Environment Setup

Installation

Data Preparation

Evaluation on HumanML3D

Train ADM

Train AMDM

Evaluate

1. Pre-generate affordance maps

2. Generate motion sequences

3. Calculate metrics

Evaluation on HUMANISE

Train ADM

Train AMDM

Evaluate

1. Pre-generate affordance maps

2. Generate motion sequences

Evaluation on Novel Evaluation Set

Train ADM

Train AMDM

Evaluate

1. Pre-generate affordance maps using the novel evluation set

2. Generate motion sequences using the novel evluation set

Contact

Citation

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages