Text to Action: Mapping text instructions along with Multi-modality data into Robotic Actions using LLMs

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li

Text to Action

Supported Modules/Tasks

Currently, the following tasks are supported:

Correspondingly, please prepare the SAM and CLIP model ckpts in advance. You can download the ckpts from SAM and OpenCLIP. Then set the path in the file 'engine_robotic.py'.

Installation Instructions

These instructions were tested on an Ubuntu os with conda package manager

Clone this repository or download and extract the zip file to a folder named 'text-to-action'
```
git clone https://github.com/bhanu-pm/text-to-action.git text-to-action
```

Install dependencies into a conda environment

cd text-to-action

conda env create -f environment.yaml

Install torch

conda activate project-10

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

Install requirements from requirements.txt file
```
pip install -r requirements.txt
```

Install vimabench

cd ..

git clone https://github.com/vimalabs/VimaBench VIMABench

cd VIMABench

pip install -e .

Install segment-anything model from my fork of the facebook's repository. I made some functionality upgrades to give it the ability to run on nvidia GPU if it is present on the device.
```
cd ..
```
```
git clone https://github.com/bhanu-pm/segment-anything.git SAM
```
```
cd SAM
```
```
pip install -e .
```

Install Open-clip

cd ..

git clone https://github.com/mlfoundations/open_clip.git open-clip

cd open-clip

pip install -e .

cd ..

How to Run

Get your openai api key from the following link API key.
Paste the api key in the /text-to-action/.env file.
Download the open_clip_pytorch_model.bin file from the following link OpenCLIP.
Download the sam_vit_h_4b8939.pth model file from the following link SAM.
Paste both of these downloaded files into the /text-to-action project folder.
run the robotic_anything_offline.py file in terminal after activating the conda environment and navigating into the /text-to-action folder.
```
conda activate project-10
```
Use one of the following commmands to run the file
```
python3 robotic_anything_offline.py
```
```
python robotic_anything_offline.py
```
If you encounter any memory errors or if the process is killed without running the robotic arm on the tasks. Uncomment the line 35 sam_device = "cpu" in the engine_robotic.py file

Prompts Setting

There are two code generation modes for robotic manipulation tasks, i.e., offline and online modes. The codes in the offline mode are generated in advance, for demo purpose. The code in online mode is generated on the fly, using OpenAI's API.

Evaluation Tasks

I selected only 4 representative meta tasks from VIMABench (17 tasks in total) to evaluate the proposed methods in the tabletop manipulation domain, as shown below.

Task	Instruction	Visualization
Visual Manipulation	Put the polka dot block into the green container.
Rotation	Rotate the letter-M 30 degrees
Pick in order then restore	Put the cyan block into the yellow square then into the white-black square. Finally restore it into its original container.
Scene Understanding	Put the blue paisley object in another given scene image into the green object.

Notes

To speed up the SAM inference progress, I added a custom cuda device option in build_sam() function of facebook's segment-anything module.
When using ChatGPT for generation, you need a paid token with API access to GPT 3.5 turbo.

Acknowledgement

I would like to thank the authors of the following great projects, this project is built upon these great open-sourced projects.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
images		images
utils		utils
visual_programming_prompt		visual_programming_prompt
.env		.env
.gitignore		.gitignore
README.md		README.md
engine_robotic.py		engine_robotic.py
environment.yaml		environment.yaml
requirements.txt		requirements.txt
robotic_anything_gpt_online_eval.py		robotic_anything_gpt_online_eval.py
robotic_anything_offline.py		robotic_anything_offline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text to Action: Mapping text instructions along with Multi-modality data into Robotic Actions using LLMs

Supported Modules/Tasks

Installation Instructions

How to Run

Prompts Setting

Evaluation Tasks

Notes

Acknowledgement

About

Releases

Packages

Languages

bhanu-pm/text-to-action

Folders and files

Latest commit

History

Repository files navigation

Text to Action: Mapping text instructions along with Multi-modality data into Robotic Actions using LLMs

Supported Modules/Tasks

Installation Instructions

How to Run

Prompts Setting

Evaluation Tasks

Notes

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages