Dex-Net 5.0 - A PyTorch implementation to train on the Dex-Net Datasets

Dex-Net 5.0 is a PyTorch implementation to train on the original Dex-Net 2.0 parallel jaw grasp and Dex-Net 3.0 suction grasp datasets. It provides improved performance and ease of use over the original codebase. Dex-Net grasp quality models take normalized single channel depth images as input and output grasp confidences. This repo implements a model similar to the original GQ-CNN (Grasp Quality Convolutional Neural Network) architecture along with a new EfficientNet-based GQ-CNN architecture. The EfficientNet-based suction GQ-CNN achieves a 95.4% accuracy on a validation set, a 1.9% increase over Dex-Net 3.0's reported accuracy. This repo also provides an FC-GQ-CNN (Fully Convolutional Grasp Quality Neural Network) architecture for grasp quality heatmap generation and training + analysis code for the models.

This project is created and maintained by AUTOLab at UC Berkeley

📚 Original Work

Dex-Net 5.0 is an extension of previous work which can be found here:

📋 Dex-Net Project Website $~~~~$ 📚 Dex-Net Documentation $~~~~$ 📦 Dex-Net Package GitHub

🚧 Project Setup

View PyTorch's Getting Started Page for PyTorch installation options

git clone https://github.com/Apgoldberg1/Dex-Net-5.0.git
cd Dex-Net-5.0
pip install -e .

The version of the Dex-Net 2.0 and 3.0 dataset used for training in this repository can be downloaded here

The model weights for suction and parallel jaw grasping can be found here

Other published datasets and mesh files from previous works can be found here

🛠️ Usage

Getting Started Notebook (GettingStarted.ipynb)

A quickstart notebook to run GQ-CNN inference on the Dex-Net dataset and FC-GQ-CNN inference on an example depth image.

train_model.py

Runs train-eval loop based on the given config.

python3 scripts/train_model.py --config PATH_TO_DESIRED_CONFIG_FILE

grasp_model.py

Defines models. Models take in a batch of normalized single channel depth images and output grasp confidence(s) (see Grasp Models section for more details).

torch_dataset.py

Provides PyTorch dataset to load the Dex-Net 3.0 and Dex-Net 2.0 datasets. The datasets contain cropped 32x32 depth images of single objects paired with a grasp confidence score. For Dex-Net 2.0 data, this grasp metric corresponds to a parallel jaw grasp centered at the middle of the image and with the grasp axis horizontally on the image. For Dex-Net 3.0, this score corresponds to a suction grasp at the center of the image with the approach axis aligned to the middle column. See Dex-Net 2.0 and Dex-Net 3.0 for more details on dataset generation.

convert_weights.py

Convert GQ-CNN weights to FC-GQ-CNN weights. Saves converted model to outputs/{file_name}_conversion.pt

python3 scripts/convert_weights.py --model_path PATH_TO_MODEL

This is only supported for DexNetBase weights.

analyze.py

Contains functions to generate precision-recall curves and compute the mean and standard deviation over the dataset.

python3 scripts/analyze.py --model_path PATH_TO_MODEL --model_name [DexNetGQCNN, EfficientNet]

time_benchmark.py

Script to benchmark the inference speed of models on random image-shaped data.

visualize_dataset.py

Script to visualize Dex-Net 2.0 or Dex-Net 3.0 dataset. Saves 10 plots, each with 25 random images and their labels to "outputs" folder. Adjust dataset path as necessary.

fcgqcnn.py

Script to run inference on a grey scale depth image. The output image is saved to outputs/fcgqcnn_out.png

python3 scripts/fcgqcnn.py --model_path PATH_TO_MODEL --img PATH_TO_IMG

Configs

The configs include YAML files specifying model name, save name, dataset path, optimizer, Wandb logging, batch size, and more. The dataset path should be to the directory containing the "tensors" folder for either the Dex-Net 2.0 or Dex-Net 3.0 dataset.

🧠 Grasp Models

GQ-CNNs (Grasp Quality Convolutional Neural Networks) are models that use a CNN backbone to predict grasp confidence scores. In Dex-Net 5.0 models labeled GQ-CNN take 32x32 images as input and output a single grasp confidence value associated with the center of the image.

FC-GQ-CNNs (Fully Convolutional Grasp Quality Neural Networks) are fully convolutional models. In Dex-Net 5.0 these can process image sizes larger than 32x32 and output a heatmap of grasp confidences in a single pass. A fully convolutional structure allows for faster inference over running multiple forward passes with a typical GQ-CNN. See Performance Analysis section for more details.

DexNetBase folllows the model described in Dex-Net 2.0. However, unlike the original implementation, it doesn't take the gripper z distance as input because this was not found to impact training (see Performance Analysis for more detail). It takes only a batch of 32x32 normalized depth images as input.

EfficientNet uses PyTorch's efficientnet_b0 implementation with an additional linear layer and softmax. It slightly outperforms "DexNetGQCNN" on suction (see Performance Analysis section).

BaseFCGQCNN is a fully convolutional network that takes a batch of normalized depth images which may be larger than 32x32 and returns a grasp confidence heatmap. Dex-Net GQ-CNN weights can be converted to Dex-Net FC-GQ-CNN weights using convert_weights.py. This can be done for both suction and parallel jaw grasp models.

HighResFCGQCNN is a fully convolutional network that outputs higher-resolution grasp maps compared to BaseFCGQCNN. It uses an offsetting trick to prevent dimension reduction from the max-pool layer.

fakeFCGQCNN runs a provided GQ-CNN across each 32x32 crop of an image to return a grasp confidence heatmap. This model is inefficient and is intended for testing and benchmarking purposes.

🔍 Performance Analysis

🪠 Suction

Training with the original architecture (Dex-Net Base) matches the performance documented in Dex-Net 3.0. EfficientNet GQ-CNN outperforms both models on the Dex-Net 3.0 dataset. Precision-recall curves are computed from a validation set containing separate objects from the train set.

Dex-Net Base Suction

18 million parameters
10240 (batch size 512), 24100 (batch size 8192) inferences per second on a single 2080Ti
6 hours of training on a single 2080 Ti
Trained with a batch size of 256
Trained with SGD and 0.9 momentum
FC-GQ-CNN version available

EfficientNet GQ-CNN

5.3 million parameters
2000 (batch size 512) inferences per second on a single 2080Ti
30 hours of training on a single 2080 Ti
Trained with a batch size of 64
Trained with Adam optimizer
Achieves 95.4% accuracy on validation set

Note that while EfficientNet is a smaller model, it scales input images to (B, 3, 224, 224) which prevents larger batch sizes.

🦈 Parallel Jaw

Precision-recall curve on validation data for the Dex-Net 5.0 parallel jaw grasp model (DexNetBase) trained on an 80-20 split of the Dex-Net 2.0 dataset. There isn't a comparable precision-recall curve from the original paper, but both achieve ~85% accuracy on the validation set.

Dex-Net Base Parallel Jaw

18 million parameters
10240 (batch size 512), 24100 (batch size 8192) inferences per second on a single 2080Ti
1 hour of training on a single V100
Trained with a batch size of 256
Trained with SGD and 0.9 momentum
FC-GQ-CNN version available

Dex-Net Base Parallel Jaw and Dex-Net Base Suction use the same model architecture.

🕙 FC-GQ-CNN Inference Speed

The FC-GQ-CNN demonstrates significant empirical efficiency improvements over naively running a GQ-CNN over each crop of the image.

The naive method (called fakeFCGQCNN in code) achieves 13.5 inferences per second. FC-GQ-CNN achieves 540 inferences per second, a 22x speedup (batch size 128, 70x70 images).

Note that at larger batch sizes, FC-GQ-CNN may experience a significant slowdown due to memory limitations.

📐 Angle Analysis

Models trained on the Dex-Net 3.0 dataset with or without the gripper approach angle and gripper z distance as inputs show no clear change from our baseline (dex3_newaug) which receives both as input.

Models trained on the Dex-Net 2.0 dataset with and without gripper z distance as an input also perform similarily

🧪 Limitations

This repository focuses on training GQ-CNN models on the original Dex-Net 2.0 and Dex-Net 3.0 datasets. It does not implement the mesh rendering, analytic grasp generation, or dataset generation discussed in the Dex-Net series.

We avoid direct comparison with older Dex-Net versions on training, inference, and dataloading speeds because they are impacted by hardware differences. Therefore, these comparisons can't be fully attributed to improved code.

All images in the Dex-Net 2.0 and Dex-Net 3.0 datasets are singulated objects on a flat surface. Please refer to Dex-Net 2.1 and Dex-Net 4.0 for multi-object bin picking work.

This repository is a standalone code release and is not part of a published paper.

📝 Citation

Please cite this repo when using its code.

@misc{Dex-Net 5.0,
  author = {Andrew Goldberg, Ryan Hoque, Chung Min Kim},
  title = {Dex-Net 5.0},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/Apgoldberg1/Dex-Net-5.0}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
README_images		README_images
configs		configs
dexnet		dexnet
getting_started		getting_started
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dex-Net 5.0 - A PyTorch implementation to train on the Dex-Net Datasets

📚 Original Work

🚧 Project Setup

🛠️ Usage

Getting Started Notebook (GettingStarted.ipynb)

train_model.py

grasp_model.py

torch_dataset.py

convert_weights.py

analyze.py

time_benchmark.py

visualize_dataset.py

fcgqcnn.py

Configs

🧠 Grasp Models

🔍 Performance Analysis

🪠 Suction

Dex-Net Base Suction

EfficientNet GQ-CNN

🦈 Parallel Jaw

🕙 FC-GQ-CNN Inference Speed

📐 Angle Analysis

🧪 Limitations

📝 Citation

About

Releases

Packages

Contributors 3

Languages

License

Apgoldberg1/Dex-Net-5.0

Folders and files

Latest commit

History

Repository files navigation

Dex-Net 5.0 - A PyTorch implementation to train on the Dex-Net Datasets

📚 Original Work

🚧 Project Setup

🛠️ Usage

Getting Started Notebook (GettingStarted.ipynb)

train_model.py

grasp_model.py

torch_dataset.py

convert_weights.py

analyze.py

time_benchmark.py

visualize_dataset.py

fcgqcnn.py

Configs

🧠 Grasp Models

🔍 Performance Analysis

🪠 Suction

Dex-Net Base Suction

EfficientNet GQ-CNN

🦈 Parallel Jaw

🕙 FC-GQ-CNN Inference Speed

📐 Angle Analysis

🧪 Limitations

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages