GenH2R is the official code for the following CVPR 2024 paper:
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation
Zifan Wang*, Junyu Chen*, Ziqing Chen, Pengwei Xie, Rui Chen, Li Yi
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[ website ] [ arXiv ] [ video ]
GenH2R is a framework for learning generalizable vision-based human-to-robot (H2R) handover skills. The goal is to equip robots with the ability to reliably receive objects with unseen geometry handed over by humans in various complex trajectories.
We acquire such generalizability by learning H2R handover at scale with a comprehensive solution including procedural simulation assets creation, automated demonstration generation, and effective imitation learning. We leverage large-scale 3D model repositories, dexterous grasp generation methods, and curve-based 3D animation to create an H2R handover simulation environment named GenH2R-Sim, surpassing the number of scenes in existing simulators by three orders of magnitude. We further introduce a distillation-friendly demonstration generation method that automatically generates a million high-quality demonstrations suitable for learning. Finally, we present a 4D imitation learning method augmented by a future forecasting objective to distill demonstrations into a visuo-motor handover policy.
Building upon handover-sim, GA-DDPG and OMG-Planner, our original codebase is a bit bulky. For better readability and extensibility, we decide to refactor our codebase to provide a simplified version.
2024.06.20
We have released the evaluation scripts and the pre-trained models.
We are actively cleaning the code for simulation scene construction, demonstration generation and policy training, and will release as soon as possible.
git clone --recursive [email protected]:chenjy2003/genh2r.git
conda create -n genh2r python=3.10
conda activate genh2r
pip install -r requirements.txt
# install pytorch according to your cuda version (https://pytorch.org/get-started/previous-versions/)
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
We highly recommand users to install this third-party package for robot kinematics. But inference and evaluation of pre-trained models can be done without this package, with env.panda.IK_solver=pybullet
added to the following commands, and with slightly different results.
cd env/third_party/orocos_kinematics_dynamics
sudo apt-get update
sudo apt-get install libeigen3-dev libcppunit-dev
cd orocos_kdl
mkdir build
cd build
cmake .. -DENABLE_TESTS:BOOL=ON
make
sudo make install
make check
cd ../../python_orocos_kdl
mkdir build
cd build
ROS_PYTHON_VERSION=3.10 cmake ..
make
sudo make install
cp PyKDL.so $CONDA_PREFIX/lib/python3.10/site-packages/
## test
python3 ../tests/PyKDLtest.py
cd third_party/Pointnet2_PyTorch
pip install pointnet2_ops_lib/.
cd ../..
Download dex-ycb-cache-20220323.tar.gz
(from handover-sim) to env/data/tmp
.
Download assets.tar.gz
(the object and hand models are from handover-sim) to env/data
.
Then run
cd env/data/tmp
tar -xvf dex-ycb-cache-20220323.tar.gz
cd ..
tar -xvf assets.tar.gz
cd ../..
python -m env.tools.process_dexycb
The processed 1000 scenes will be in data/scene/00/00
, from data/scene/00/00/00/00000000.npz
to data/scene/00/00/09/00000999.npz
.
Our pre-trained models can be downloaded here.
We use ray for parallel evaluation in order to support larger test set. One can feel free to adjust CUDA_VISIBLE_DEVICES
(the GPUs to use) and num_runners
(the total number of runners) according to the local machine, without changing the evaluation results.
We observed that the evaluation results can be slightly different on different devices, which unfortunately originates in some third-party packages. Our evaluation is done on NVIDIA GeForce RTX 3090
.
CUDA_VISIBLE_DEVICES=0 python -m evaluate \
setup=s0 split=test num_runners=16 \
policy=pointnet2 pointnet2.processor.flow_frame_num=3 pointnet2.model.obj_pose_pred_frame_num=3 \
pointnet2.model.pretrained_dir=${model_dir} \
pointnet2.model.pretrained_source=handoversim
Here model_dir
should be the path of the folder containing model parameters.
CUDA_VISIBLE_DEVICES=0 python -m evaluate \
setup=s0 split=test num_runners=16 \
policy=pointnet2 pointnet2.processor.flow_frame_num=3 pointnet2.model.obj_pose_pred_frame_num=3 \
pointnet2.model.pretrained_dir=${model_dir} \
pointnet2.model.pretrained_source=handoversim \
pointnet2.wait_time=3
To visualize the handover process, one can have two options:
To use GUI, one can only start a single process, therefore parallel evaluation should be disabled by setting use_ray=False
. Then GUI can be enabled by setting env.visualize=True
. Note that one can have a finer control of which scenes to evaluate by setting scene_ids
instead of setup
and split
.
CUDA_VISIBLE_DEVICES=0 python -m evaluate \
scene_ids=[214,219] use_ray=False \
env.visualize=True \
policy=pointnet2 pointnet2.processor.flow_frame_num=3 pointnet2.model.obj_pose_pred_frame_num=3 \
pointnet2.model.pretrained_dir=${model_dir} \
pointnet2.model.pretrained_source=handoversim
To record videos, we need to set demo_dir
(where to store the videos), record_ego_video=True
and record_third_person_video=True
.
CUDA_VISIBLE_DEVICES=0 python -m evaluate \
setup=s0 split=test num_runners=16 \
policy=pointnet2 pointnet2.processor.flow_frame_num=3 pointnet2.model.obj_pose_pred_frame_num=3 \
pointnet2.model.pretrained_dir=${model_dir} \
pointnet2.model.pretrained_source=handoversim \
demo_dir=data/tmp record_ego_video=True record_third_person_video=True
There is an argument demo_structure
controlling how the demonstration data are arranged. If set to hierarchical
by default, then the data will be stored in a hierarchical way, like data/tmp/00/00/02/00000209_ego_rgb.mp4
. If set to flat
, then all the data will be stored in the same folder data/tmp
.
One can also adjust the position and orientation of the third person camera by setting env.third_person_camera.pos
(default [1.5,-0.1,1.8]
), env.third_person_camera.target
(default [0.6,-0.1,1.3]
), and env.third_person_camera.up_vector
(default [0.,0.,1.]
).
This repo is built based on handover-sim, GA-DDPG, OMG-Planner, acornym, orocos_kinematics_dynamics, and Pointnet2_PyTorch. We sincerely appreciate their contributions to open source.
If GenH2R is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@article{wang2024genh2r,
title={GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation},
author={Wang, Zifan and Chen, Junyu and Chen, Ziqing and Xie, Pengwei and Chen, Rui and Yi, Li},
journal={arXiv preprint arXiv:2401.00929},
year={2024}
}