Skip to content

TCC-IRoNL is a novel framework that leverages large language models (LLMs) and multi-model vision-language models (VLMs) to enable ROS-based autonomous robots to interact with humans or other entities through natural language conversation.

Notifications You must be signed in to change notification settings

LinusNEP/TCC-IRoNL

Repository files navigation

The Conversation is the Command: Interacting with Real-World Autonomous Robots through Natural Language (TCC-IRoNL)

Contents

TCC-IRoNL

TCC-IRoNL is a framework that synergically exploits the capabilities of pre-trained large language models (LLMs) and a multimodal vision-language model (VLM) to enable humans to interact naturally with autonomous robots through conversational dialogue. It leverages the LLMs to decode the high-level natural language instructions from humans and abstract them into precise robot actionable commands or queries. Further, it utilised the VLM to provide a visual and semantic understanding of the robot’s task environment. Refer to the paper here for more details.

Citation

If you use this work in your research, please cite it using the following BibTeX entry:

@inproceedings{10.1145/3610978.3640723,
author = {Linus, Nwankwo and Elmar, Rueckert},
title = {The Conversation is the Command: Interacting with Real-World Autonomous Robot Through Natural Language},
year = {2024},
isbn = {979-8-4007-0323-2/24/03},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3610978.3640723},
doi = {10.1145/3610978.3640723},
booktitle = {Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction},
numpages = {5},
keywords = {Human-robot interaction, LLMs, VLMs, ChatGPT, ROS, autonomous robots, natural language interaction},
location = {Boulder, CO, USA},
series = {HRI '24}
}

TCC-IRoNL installation

The following instructions are necessary to set up TCC-IRoNL. Please note that CUDA and Python 3.8 or above are required.

1. Install ROS and the navigation planner:

TCC-IRoNL can work with any ROS-based mobile robot publishing standard ROS topics. The whole framework is implemented using ROS Noetic. It was also tested using ROS Melodic in a Docker environment. For ROS2, you will need a ros-bridge to bridge the ros2 topics. To install ROS, follow the instructions at the ROS Wiki. You will need to install or ensure that you have the ROS navigation planner and its dependencies installed. Install the navigation planner and the dependencies by running the ./planner_dependencies.sh script. After successful installation, follow the next steps to install TCC-IRoNL.

Create a ROS workspace:

 mkdir -p ~/catkin_ws/src
 cd ~/catkin_ws/src

Clone the TCC-IRoNL repository to your workspace:

git clone https://github.com/LinusNEP/TCC-IRoNL.git

2. Install TCC-IRoNL dependencies:

mv install_TCC-IRoNL_deps.sh ~/catkin_ws/
cd ~/catkin_ws
bash install_TCC-IRoNL_deps.sh

Build the workspace:

catkin_make
echo "source ~/catkin_ws/devel/setup.bash" >> ~/.bashrc
source ~/.bashrc
source devel/setup.bash

Run TCC-IRoNL Example Demos

Simulation

Open six terminal windows (T1-T6) in your workspace directory and run the following:

T1 - T3 (if quadruped robot):

First, make sure to source all the opened terminals source devel/setup.bash.

roslaunch unitree_gazebo sim_bringup.launch rname:=go1 wname:=cps_world rviz:=false
roslaunch unitree_navigation navigation.launch rname:=go1 rviz:=true
rosrun unitree_guide main_ctrl

After running T1 -T3 above, the robot will lie on the floor of the Gazebo world. At the terminal where you ran rosrun unitree_guide main_ctrl, press the key '2' on the keyboard to switch the robot's state from Passive(initial state) to FixedStand. After, you can press the '5' key to switch from FixedStand to MoveBase. At this point, the robot is ready to receive navigation commands.

T1 - T3 (if wheeled robot):

source devel/setup.bash
roslaunch romr_ros romr_navigation.launch

For the wheeled robot, you do not need to switch states. After launching roslaunch romr_ros romr_navigation.launch, execute T4 - T6, and start interacting with the robot.

T4 - T6:

Ensure that the virtual environment that was created after installing the TCC-IRoNL and its dependencies is activated source TCC-IRoNLEnv/bin/activate in each of T4 - T6. Set permissions to the executable scripts (bash set_permission.sh). Upon running roslaunch tcc-ironl llm_node.launch and roslaunch tcc-ironl vlm_node.launch, a menu will appear, allowing you to select LLM options such as OpenAI GPT-2, Google BERT, Facebook RoBERTa, and VLM options such as CLIP, GLIP for execution. To exit, simply press ctrl + c and select 0 to terminate the program.

roslaunch tcc-ironl llm_node.launch
roslaunch tcc-ironl vlm_node.launch
rosrun tcc-ironl chatGUI.py

Interact with the simulated robot through natural language with the chatGUI interface that will pop up after executing rosrun tcc-ironl chatGUI.py above. You can send the robot to a goal location e.g., go to the Secretary's office, move in a circular pattern, where are you now, etc.

Real-World Robot

Launch your robot! Ensure that the ROS topics and parametric configurations in the table below are available. Sending custom movement commands and queries such as "move forward, backwards, right, what can you see around you? where are you now? etc." may not require further configuration. However, sending goal navigation tasks such as "navigate to xxx's office" would require you to update the task dictionary (task_dict.yaml) with the approximate x, y, z coordinates of the task environment. You can obtain such coordinates information from LiDAR or point-cloud data.

  • Configurations:

    Topics Publisher Subscribers Description Msg Type
    /odom REM MoveBase, LLMNode Robot's odometry data nav_msgs/Odometry
    /cmd_vel MoveBase, LLMNode REM Robot's command velocity data geometry_msgs/Twist
    /clip_node/recognized_objects CLIPNode LLMNode CLIPNode objects descriptions std_msgs/String
    /llm_input ChatGUI LLMNode User's input commands, queries and tasks std_msgs/String
    /llm_output LLMNode ChatGUI LLMNode's interpretation of the input command std_msgs/String
    /depth/image, /rgb/image* Observation Source CLIPNode, LLMNode, YOLO V8* Image stream from RGB-D camera sensor_msgs/Image
    /depth/points Observation Source LLMNode Point cloud from 3D LiDAR or RGB-D camera sensor_msgs/PointCloud2
    /detection_result YOLO V8 CLIPNode 2D bounding box from YOLO V8 detected objects vision_msgs/Detection2DArray
  • Observation sources:

    • Ouster 3D LiDAR - sensor_msgs/PointCloud2, sensor_msgs/LaserScan
    • Intel Realsense D435i Camera - sensor_msgs/Image, sensor_msgs/PointCloud2
  • Robot's base frame: base_link

  • Map frame: map

Run TCC-IRoNL on your Own Robot

Configure your robot using the ROS topic configurations described in the table above. Then, follow the instructions to launch T4 - T6 as shown above and begin interacting with the robot. Keep in mind that sending navigation tasks like "navigate to xxx's office" will require you to update the task dictionary (task_dict.yaml) with the approximate x, y, z coordinates of the task environment. You can extract such coordinate information from LiDAR or point-cloud data. For custom commands such as move forward, turn right etc, and queries, no additional configurations are needed."

License

Creative Commons licenses 4.0

This work is licensed under a Creative Commons Attribution International 4.0 License.

Acknowledgement

This work is still in progress, therefore, expect some bugs. However, we would appreciate your kind contribution or raising an issue for such bug.

Thanks to the following repository:

About

TCC-IRoNL is a novel framework that leverages large language models (LLMs) and multi-model vision-language models (VLMs) to enable ROS-based autonomous robots to interact with humans or other entities through natural language conversation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published