Update: we have released our code and paper for our new vision system here, which took 1st place in the stowing task at the Amazon Robotics Challenge 2017.
This repository contains toolbox code for our vision system that took 3rd and 4th place at the Amazon Picking Challenge 2016. Includes RGB-D Realsense sensor drivers (standalone and ROS package), deep learning ROS package for 2D object segmentation (training and testing), ROS package for 6D pose estimation. This is the reference implementation of models and code for our paper:
Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge (pdf, arxiv, webpage)
Andy Zeng, Kuan-Ting Yu, Shuran Song, Daniel Suo, Ed Walker Jr., Alberto Rodriguez and Jianxiong Xiao
IEEE International Conference on Robotics and Automation (ICRA) 2017
Warehouse automation has attracted significant interest in recent years, perhaps most visibly by the Amazon Picking Challenge (APC). Achieving a fully autonomous pick-and-place system requires a robust vision system that reliably recognizes objects and their 6D poses. However, a solution eludes the warehouse setting due to cluttered environments, self-occlusion, sensor noise, and a large variety of objects. In this paper, we present a vision system that took 3rd- and 4th- place in the stowing and picking tasks, respectively at APC 2016. Our approach leverages multi-view RGB-D data and data-driven, self-supervised learning to overcome the aforementioned difficulties. More specifically, we first segment and label multiple views of a scene with a fully convolutional neural network, and then fit pre-scanned 3D object models to the resulting segmentation to get the 6D object pose. Training a deep neural network for segmentation typically requires a large amount of training data with manual labels. We propose a self-supervised method to generate a large labeled dataset without tedious manual segmentation that could be scaled up to more object categories easily. We demonstrate that our system can reliably estimate the 6D pose of objects under a variety of scenarios.
If you find this code useful in your work, please consider citing:
@inproceedings{zeng2016multi,
title={Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge},
author={Zeng, Andy and Yu, Kuan-Ting and Song, Shuran and Suo, Daniel and Walker Jr, Ed and Rodriguez, Alberto and Xiao, Jianxiong},
booktitle={ICRA},
year={2016}
}
This code is released under the Simplified BSD License (refer to the LICENSE file for details).
All relevant dataset information and downloads can be found here.
If you have any questions or find any bugs, please let me know: Andy Zeng andyz[at]princeton[dot]edu
- A Quick Start: Matlab Demo
- 6D Pose Estimation ROS Package
- Realsense Standalone
- Realsense ROS Package
- Deep Learning FCN ROS Package
- FCN Training with Marvin
- Evaluation Code
- 3D Annotation Tool
Estimates 6D object poses on the sample scene data (in data/sample
) with pre-computed object segmentation results from Deep Learning FCN ROS Package:
git clone https://github.com/andyzeng/apc-vision-toolbox.git
(Note: source repository size is ~300mb, cloning may take a while)cd apc-vision-toolbox/ros-packages/catkin_ws/src/pose_estimation/src/
- Start Matlab and run
mdemo
A Matlab ROS Package for estimating 6D object poses by model-fitting with ICP on RGB-D object segmentation results. 3D point cloud models of objects and bins can be found here.
- Deep Learning FCN ROS Package and all of its respective dependencies.
- Matlab 2015b or later
- Copy the ROS package
ros_packages/.../pose_estimation
into your catkin workspace source directory (e.g.catkin_ws/src
) - Follow the instructions on the top of
pose_estimation/src/make.m
to compile ROS custom messages for Matlab - Compile a GPU CUDA kernel function in
pose_estimation/src
:
nvcc -ptx KNNSearch.cu
- Start
roscore
- To start the pose estimation service, run
pose_estimation/src/startService.m
. At each call (see service request format described inpose_estimation/srv/EstimateObjectPose.srv
), the service: - Calibrates the camera poses of the scene using calibration data
- Perform 3D background subtraction
- For each object in the scene, use model-fitting to estimate its 6D pose
- Install all dependencies and compile this package
- Start
roscore
in terminal - Create a temporary directory to be used by marvin_convnet for reading RGB-D data and saving segmentation masks
mkdir /path/to/your/data/tmp
rosrun marvin_convnet detect _read_directory:="/path/to/your/data/tmp"
- Navigate to
pose_estimation/src
- Edit file paths and options on the top of
demo.m
- Open Matlab and run:
startService.m
demo.m
A standalone C++ executable for streaming and capturing data (RGB-D frames and 3D point clouds) in real-time using librealsense. Tested on Ubuntu 14.04 and 16.04 with an Intel® RealSense™ F200 Camera.
See realsense_standalone
- librealsense v1 (important: this code only works with librealsense version 1 - installation instructions can be found here)
- Install with the Video4Linux backend
- OpenCV (tested with OpenCV 3.1)
- Used for saving images
cd realsense_standalone
./compile.sh
After compiling, run ./stream
to begin streaming RGB-D frames from the Realsense device. While the stream window is active, press the space-bar key to capture and save the current RGB-D frame to disk. Relevant camera information and captured RGB-D frames are saved to a randomly named folder under data
.
If your Realsense device is plugged in but remains undetected, try using a different USB port. If that fails, run the following script while the device is unplugged to refresh your USB ports:
sudo ./scripts/resetUSBports.sh
A C++ ROS package for streaming and capturing data (RGB-D frames and 3D point clouds) in real-time using librealsense. Tested on Ubuntu 14.04 and 16.04 with an Intel® RealSense™ F200 Camera.
This ROS packages comes in two different versions. Which version is installed will depend on your system's available software:
- Version #1: only returns RGB-D frame data on service calls (does not require OpenCV or PCL)
- Version #2: returns RGB-D frame data on service calls and publishes 3D point clouds (requires OpenCV and PCL)
See ros-packages/realsense_camera
- librealsense v1 (important: this code only works with librealsense version 1 - installation instructions can be found here)
- Install with the Video4Linux backend
- [Optional] OpenCV (tested with OpenCV 2.4.11)
- Used for saving images
- [Optional] Point Cloud Library (tested with PCL 1.7.1)
- Used for saving point clouds
- Copy the ROS package
ros_packages/.../realsense_camera
into your catkin workspace source directory (e.g.catkin_ws/src
) - If necessary, configure
realsense_camera/CMakeLists.txt
according to your respective dependencies - In your catkin workspace, compile the package with
catkin_make
- Source
devel/setup.sh
- Start
roscore
- To start the RGB-D data capture service and stream data from the sensor, run:
rosrun realsense_camera capture
- The service
/realsense_camera
returns data from the sensor (response data format described inrealsense_camera/srv/StreamSensor.srv
) - If you need a GL window to see the streamed RGB-D data, run
rosrun realsense_camera capture _display:=True
A C++ ROS package for deep learning based object segmentation using FCNs (Fully Convolutional Networks) with Marvin, a lightweight GPU-only neural network framework. This package feeds RGB-D data forward through a pre-trained ConvNet to retrieve object segmentation results. The neural networks are trained offline with Marvin (see FCN Training with Marvin).
See ros-packages/marvin_convnet
-
Realsense ROS Package needs to be compiled first.
-
CUDA 7.5 and cuDNN 5. You may need to register with NVIDIA. Below are some additional steps to set up cuDNN 5. NOTE We highly recommend that you install different versions of cuDNN to different directories (e.g.,
/usr/local/cudnn/vXX
) because different software packages may require different versions.
LIB_DIR=lib$([[ $(uname) == "Linux" ]] && echo 64)
CUDNN_LIB_DIR=/usr/local/cudnn/v5/$LIB_DIR
echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDNN_LIB_DIR >> ~/.profile && ~/.profile
tar zxvf cudnn*.tgz
sudo cp cuda/$LIB_DIR/* $CUDNN_LIB_DIR/
sudo cp cuda/include/* /usr/local/cudnn/v5/include/
- OpenCV (tested with OpenCV 2.4.11)
- Used for saving images
- Copy the ROS package
ros_packages/.../marvin_convnet
into your catkin workspace source directory (e.g.catkin_ws/src
) - If necessary, configure
realsense_camera/CMakeLists.txt
according to your respective dependencies - In your catkin workspace, compile the package with
catkin_make
- Source
devel/setup.sh
- Navigate to
ros_packages/.../marvin_convnet/models/competition/
and run bash script./download_weights.sh
to download our trained weights for object segmentation (trained on our training dataset) - Edit
marvin_convnet/src/detect.cu
: Towards the top of the file, specify the filepath to the network architecture .json file and .marvin weights. - Create a folder called
tmp
inapc-vision-toolbox/data
(e.g.apc-vision-toolbox/data/tmp
). This where marvin_convnet will read/write RGB-D data. The format of the data intmp
follows the format of the scenes in our datasets and the format of the data saved by Realsense Standalone. - marvin_convnet offers two services:
save_images
anddetect
. The former retrieves RGB-D data from the Realsense ROS Package and writes to disk in thetmp
folder, while the latter reads from disk in thetmp
folder and feeds the RGB-D data forward through the FCN and saves the response images to disk - To start the RGB-D data saving service, run:
rosrun marvin_convnet save_images _write_directory:="/path/to/your/data/tmp" _camera_service_name:="/realsense_camera"
- To start the FCN service, run:
rosrun marvin_convnet detect _read_directory:="/path/to/your/data/tmp" _service_name:="/marvin_convnet"
- Example ROS service call to do object segmentation for glue bottle and expo marker box (assuming the scene's RGB-D data is in the
tmp
folder):
rosservice call /marvin_convnet ["elmers_washable_no_run_school_glue","expo_dry_erase_board_eraser"] 0 0
Code and models for training object segmentation using FCNs (Fully Convolutional Networks) with Marvin, a lightweight GPU-only neural network framework. Includes network architecture .json files in convnet-training/models
and a Marvin data layer in convnet-training/apc.hpp
that randomly samples RGB-D images (RGB and HHA) from our segmentation training dataset.
See convnet-training
- CUDA 7.5 and cuDNN 5. You may need to register with NVIDIA. Below are some additional steps to set up cuDNN 5. NOTE We highly recommend that you install different versions of cuDNN to different directories (e.g.,
/usr/local/cudnn/vXX
) because different software packages may require different versions.
LIB_DIR=lib$([[ $(uname) == "Linux" ]] && echo 64)
CUDNN_LIB_DIR=/usr/local/cudnn/v5/$LIB_DIR
echo LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDNN_LIB_DIR >> ~/.profile && ~/.profile
tar zxvf cudnn*.tgz
sudo cp cuda/$LIB_DIR/* $CUDNN_LIB_DIR/
sudo cp cuda/include/* /usr/local/cudnn/v5/include/
- OpenCV (tested with OpenCV 2.4.11)
- Used for reading images
- Download our segmentation training dataset
- Navigate to directory
convnet-training/
- Specify training dataset filepath in APCData layer of network architecture in
models/train_shelf_color.json
- Navigate to
models/weights/
and run bash script./download_weights.sh
to download VGG pre-trained weights on ImageNet (see Marvin for more pre-trained weights) - Navigate to
convnet-training/
and run in terminal./compile.sh
to compile Marvin. - Run in terminal
./marvin train models/rgb-fcn/train_shelf_color.json models/weights/vgg16_imagenet_half.marvin
to train a segmentation model on RGB-D data with objects in the shelf (for objects in the tote, use network architecturemodels/rgb-fcn/train_shelf_color.json
).
Code used to perform the experiments in our paper; tests the full vision system on the 'Shelf & Tote' benchmark dataset.
See evaluation
- Download our 'Shelf & Tote' benchmark dataset from here and extract its contents to
apc-vision-toolbox/data/benchmark
(e.g.apc-vision-toolbox/data/benchmark/office
, `apc-vision-toolbox/data/benchmark/warehouse', etc.) - In
evaluation/getError.m
, change the variablebenchmarkPath
to point to the filepath of your benchmark dataset directory - We have provided our vision system's predictions in a saved Matlab .mat file
evaluation/predictions.mat
. To compute the accuracy of these predictions against the ground truth labels of the 'Shelf & Tote' benchmark dataset, runevaluation/getError.m
An online WebGL-based tool for annotating ground truth 6D object poses on RGB-D data. Follows an implementation of RGB-D Annotator with small changes. Here's a download link to our exact copy of the annotator.