JARGO

Just Another Robot Grabbing Objects

J.A.R.G.O is a CSCI-513 Autonomous Cyber-Physical System group project.

Commanding the Toyota HSR to pick up object using text commands.

Project Showcase & Demo

Motivation

Helping people with limited motor functions has always been a target use case in robotics field. However, using traditional control methods usually requires professoinal training. In this project, we used the Toyota Human Support Robot (HSR) as the platform, and implemented a way to use textual instrucitons to command the robot for simple tasks like fetching an object.

Understanding Textual Instructions

Natural language commands usually contains ambiguities and cannot be understood directly by computers. In cyber-physical systems, Signal Temporal Logic (STL)is a way to formalize control properties with time constraints. For conversion from natural language commands to STLs, we used DialogueSTL.

DialogueSTL takes in text commands, processes the natural language, and generates Parametric STL (PSTL) candidates. DialogueSTL will then ask the user for a demonstration. Together with questions asked to the user to resolve ambiguities, DialogueSTL uses robustness as a benchmark to select the final STL (or Best STL) as output.

Learning Optimal Policy

We used neural networks with Q-Learning for generating the optimal policy. For modeling the reward function, our Deep RL model used the robustness of an STL formula, or the signed distance of a given trajectory from satisfying or violating a given formula.

Map Projection & Policy Conversion

For reducing computation and complexity, we projected the Gazebo world used by the HSR simulator, to a 2D grid world, one that’s similar to what is used in the DialogueSTL demo.

We generate the optimal policy on the 2d grid world, and then convert it back to the Gazebo world, to command the HSR to move from start state to goal state.

Object Detection, Pose Estimation Using AR Marker

For detecting the bottle and moving the end effector to grab it, we used an AR marker stuck to the bottle.

The frame position can be calculated when the marker can be seen in both of the head cameras. When the AR marker frame position is acquired, we can move the end effector around the bottle and apply the grip.

Future Work

We can use well-trained vision models (like YOLOv5) in addition to AR markers for object detection, this will enable our robot to recognize not only known/predefined objects, but also objects not previously specified.
Since DialogueSTL uses natural language for input and asking questions, our work can also be extended to a voice-based system by implementing speech-to-text and Text-to-Speech.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
armarker.png		armarker.png
map_projection.png		map_projection.png
text2stl.png		text2stl.png
thumbnail.png		thumbnail.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JARGO

Project Showcase & Demo

Motivation

Understanding Textual Instructions

Learning Optimal Policy

Map Projection & Policy Conversion

Object Detection, Pose Estimation Using AR Marker

Future Work

About

Releases

Packages

tony1ee/JARGO

Folders and files

Latest commit

History

Repository files navigation

JARGO

Project Showcase & Demo

Motivation

Understanding Textual Instructions

Learning Optimal Policy

Map Projection & Policy Conversion

Object Detection, Pose Estimation Using AR Marker

Future Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages