Text to Action: Mapping text instructions along with Multi-modality data into Robotic Actions using LLMs
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li
Currently, the following tasks are supported:
Correspondingly, please prepare the SAM and CLIP model ckpts in advance. You can download the ckpts from SAM and OpenCLIP. Then set the path in the file 'engine_robotic.py'.
These instructions were tested on an Ubuntu os with conda package manager
-
Clone this repository or download and extract the zip file to a folder named 'text-to-action'
git clone https://github.com/bhanu-pm/text-to-action.git text-to-action
-
Install dependencies into a conda environment
cd text-to-action
conda env create -f environment.yaml
-
Install torch
conda activate project-10
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
-
Install requirements from requirements.txt file
pip install -r requirements.txt
-
Install vimabench
cd ..
git clone https://github.com/vimalabs/VimaBench VIMABench
cd VIMABench
pip install -e .
-
Install segment-anything model from my fork of the facebook's repository. I made some functionality upgrades to give it the ability to run on nvidia GPU if it is present on the device.
cd ..
git clone https://github.com/bhanu-pm/segment-anything.git SAM
cd SAM
pip install -e .
-
Install Open-clip
cd ..
git clone https://github.com/mlfoundations/open_clip.git open-clip
cd open-clip
pip install -e .
cd ..
- Get your openai api key from the following link API key.
- Paste the api key in the /text-to-action/.env file.
- Download the open_clip_pytorch_model.bin file from the following link OpenCLIP.
- Download the sam_vit_h_4b8939.pth model file from the following link SAM.
- Paste both of these downloaded files into the /text-to-action project folder.
- run the robotic_anything_offline.py file in terminal after activating the conda environment and navigating into the /text-to-action folder.
Use one of the following commmands to run the file
conda activate project-10
python3 robotic_anything_offline.py
python robotic_anything_offline.py
- If you encounter any memory errors or if the process is killed without running the robotic arm on the tasks. Uncomment the line 35
sam_device = "cpu"
in the engine_robotic.py file
There are two code generation modes for robotic manipulation tasks, i.e., offline and online modes. The codes in the offline mode are generated in advance, for demo purpose. The code in online mode is generated on the fly, using OpenAI's API.
I selected only 4 representative meta tasks from VIMABench (17 tasks in total) to evaluate the proposed methods in the tabletop manipulation domain, as shown below.
- To speed up the SAM inference progress, I added a custom cuda device option in build_sam() function of facebook's segment-anything module.
- When using ChatGPT for generation, you need a paid token with API access to GPT 3.5 turbo.
I would like to thank the authors of the following great projects, this project is built upon these great open-sourced projects.