This repository is the code base for the (rejected) IROS 2024 submission: Temporal and Semantic Evaluation Metrics for Foundation Models in Post-Hoc Analysis of Robotic Sub-tasks and a revised RA-L submission (Not yet published) by Jonathan Salfity, Selma Wanna, Minkyu Choi, and Mitch Pryor. The corresponding author is Jonathan Salfity (j [dot] salfity [at] utexas [dot] edu).
The code base is divided into the following sections:
- Data Generation through Robosuite simulations and Finite State machine (FSM) implementation is in (
scripts/
). The data is stored indata/
as.txt
and/or.mp4
files upon generation, depending on the config file. For data used in the original paper submission, contact j [dot] salfity [at] utexas [dot] edu for access. - Querying a Foundation Model (FM) for sub-task decomposition is in
(analysis/query_LLM.py)
- Analysis of the FM output, comparison with groundtruth data, comparison with human annotations, plot and table generation is in
(analysis/main_metrics_calculations.ipynb)
- Human annotation data is in
output/
. - The main metrics (temporal and semantic) calculations are in
analysis/comparisons.py
, specifically theget_subtask_similarity
function.
Supporting functions including API call, prompt building, in-context learning examples, and random baseline implementation are found in /utils
.
Install this package
pip install -e .
Download the mujoco binaries from here.
Place in ~/.mujoco/mujoco<>/
folder. Install mujoco via pip
pip install mujoco
Install robosuite
:
pip install robosuite
Configure which environment to run in the /scripts/demo_config.yaml
.
Currently we have 4 environments: "Stack", "Lift", "Door", "PickPlace".
Follow the uncommented lines in /scripts/demo_config.yaml
to set the correct fields.
Run the data generation script
python scripts/run_demo.py
(This data is not used in the IROS paper)
Go to the robomimic site to download data: https://robomimic.github.io/docs/datasets/robomimic_v0.1.html.
(Note that this currently only seems to work with Safari broswer).
Place the downloaded hdf5 files in the respective data/robomimic
folder.
Run the data generation script /scripts/record_robomimic_data.py
with command line args that specificy the path to the demo_v141.hdf5
file, the number of demos to run, and whether to save_txt
or save_video
.
The script will automatically extract the specific env_name
and place the text and videos in the respective data/txt
or data/video
folders.
Example:
python scripts/record_robomimic_data.py --dataset path/to/robomimic/demo_v141.hdf5 --num_demos 1 --save_txt 1 --save_video 1
Assuming you have set up OpenAI and generativeai python packages and set the API keys as environment variables, i.e. OPENAI_API_KEY
and GOOGLE_API_KEY
.
The configuration file for the LLM is in config/query_LLM_config.yaml
.
The following are options for the FM model:
gpt-4-vision-preview
gpt-4-1106-preview
gemini-pro
gemini-pro-vision
(Not in this repo, called via Google Cloud Vertix AI API)
All states in each environment are in R^3 and represent the x-y-z position of the object in the environment. All actions are in R^7, using the Robosuites OSC_POSE
controller.
The following are options for the environment:
Door
:- States:
robot0_eef_pos
,door_pos
,handle_pos
,door_to_eef_pos
,handle_to_eef_pos
.
- States:
Lift
- States:
robot0_eef_pos
,cube_pos
,gripper_to_cube
.
- States:
PickPlace
- States:
robot0_eef_pos
,Can_pos
,Can_to_robot0_eef_pos
.
- States:
Stack
- States:
robot0_eef_pos
,cubeA_pos
,cubeB_pos
,gripper_to_cubeA
,gripper_to_cubeB
,cubeA_to_cubeB
.
- States:
The following are options for input modalities and in-context learning examples to include in the LLM prompt query, which can be used in combination with each other:
textual_input
: (True or False)video_input
: (True or False)in_context
: (True or False)
To run the LLM, run the following command:
python analysis/query_LLM.py
See analysis/main_metrics_calculations.ipynb
to generate plots show in the paper.