When Learning Is Out of Reach, Reset: Generalization in Autonomous Visuomotor Reinforcement Learning
Zichen "Charles" Zhang, Luca Weihs
PRIOR @ Allen Institute for AI
If you find this project useful in your research, please consider citing:
@article{zhang2023when,
title = {When Learning Is Out of Reach, Reset: Generalization in Autonomous Visuomotor Reinforcement Learning},
author = {Zichen Zhang and Luca Weihs},
year = {2023},
journal = {arXiv preprint arXiv: Arxiv-2303.17600},
}
- Follow the instruction for installing
mujuco-py
and install the following apt packages if using Ubuntu:
$ sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
- Clone this repository locally
git clone https://github.com/zcczhang/rmrl.git && cd rmrl
pip install -e .
- Alternatively, create conda environment with name
rmrl
conda env create --file ./conda/environment-base.yml --name rmrl
conda activate rmrl
The Stretch-P&P and RoboTHOR ObjectNav is built based on AI2-THOR. After installing the requirements, you could start the xserver by running
python scripts/startx.py
benchmark_video.mp4
Example usage can be found at jupyter notebook here. APIs are following as iTHOR and ManipulaTHOR. Controller parameters and other constants can be found at here (e.g. object partitions, action scales, e.t.c.). To modify the scene and objects (in Unity), see instructions here. Now we provide details about the benchmark:
The types of sensors provided for this task include:
- RGB image (
RGBSensorStretch
) -224x224x3
egocentric camera mounted at the agent wrist. - egomotion (
StretchPolarSensor
) -2+2=4
dimensional agent gripper position and target goal position relatively to the agent base. - prompt (
StretchPromptSensor
) - language prompt including the picking object and target obect/egocentric point goal. E.gPut red apple to stripe plate.
A total of 10 actions are available to our agents, these include (x, y, z are relative to robot base):
Action | Description | Scale |
---|---|---|
MoveAhead |
Move robot base in |
5 cm |
MoveBack |
Move robot base in |
5 cm |
MoveArmHeightP |
Increase the arm height ( |
5 cm |
MoveArmHeightM |
Decrease the arm height ( |
5 cm |
MoveArmP |
Extend the arm horizontally ( |
5 cm |
MoveArmM |
Retract the arm horizontally ( |
5 cm |
MoveWristP |
Rotate the gripper in |
|
MoveWristM |
Rotate the gripper in |
|
PickUp(object_id) |
Pick up object with specified unique object_id if object within the sphere with radius |
|
Release |
Release object with simulation steps until object is relatively stable | -- |
In order to define a new task or change any component of the training procedure, it suffices to look into/change the following files and classes.
The StretchManipulaTHOREnvironment
defined in allenact_plugins/stretch_manipulathor_plugin/stretch_arm_environment.py
is a wrapper around the AI2-THOR environment which helps with discretizing the action space for the pick-and-place tasks and wraps the functions that are needed for working with the low-level manipulation features.
Important Features
step(action)
translates theaction
generated by the model to their corresponding AI2THOR API commands.is_object_at_low_level_hand(object_id)
Checks whether the object with unique idobject_id
is at hand or not.get_absolute_hand_state
get the position, rotation, and hand radius for current stateget_object_by_id(object_id)
get metadata of a specified objectteleport_object(target_object_id, target_position, rotation, fixed)
helper function that teleports the specified object to a target position and rotation, and set stationary iffixed
.randomize_texture_lightning
deterministic texture and lightning randomization function that randomly sample or change from specified for table, table leg, wall, floor, and/or light's cosmetic augmentations.scale_object(object_id, scale)
wrapped function that scale the selected objecthand_reachable_space_on_countertop(countertop_id)
get a proximity xyz limits that the gripper can reach considering the specified countertop.
StretchPickPlace
task class can be found at allenact_plugins/stretch_manipulathor_plugin/stretch_tasks/strech_pick_place.py
. This class includes the possible actions, reward definition, metric calculation and recording and calling the appropriate API functions on the environment.
Important Features
success_criteria
the picking object’s bounding box should intersect with the receptacle trigger box (which is different from the bounding box and only includes the area of the receptacle, e.g. internal rectangular area of a pan without the handle) when both objects static. Secondly, the distance between the picking object and the center of the receptacle trigger box must be within a threshold to avoid edge cases or large receptacles. In the case of random targets or point goals, only the second criterion is used.metrics
Calculates and logs the value of each evaluation metric per episode.current_state_metadata
useful state information each step, containing agent, hand, picking object, goal state metadata.task_name
prompt that parsed from current tasksetup_from_task_data
setup initial configuration from the inputTaskData
(including scene name, picking object metadata, receptacle metadata, etc).is_obj_off_table
checking whether the picking object is off its parent receptacle
StretchExpRoomPickPlaceTaskSampler
andStretchExpRoomPickPlaceResetFreeTaskSampler
for episodic and RF/RM RL can be found at allenact_plugins/stretch_manipulathor_plugin/strech_task_sampler_exproom.py
. These class is in charge of initializing the all possible locations for the object and agent and randomly sampling a data point for each episode for episodic RL or phase for RM-RL.
Important Features
need_hard_reset
reset criteria checked every phase/episode. Also related todispersion_measure_check
anddistance_measure_check
which implement our methods.sample_new_combo
sample the picking and placing objects or point goals for next task, considering the RM-RL algorithm (random targets or two-phases forward-backward), budgets (possible objects in current distributed process)next_task
creates next task instance for interactions. If an intervention is determined by methods (e.g. episodic, measurement-led, periodic) gets the source and target locations, initializes the agent and transport the object to its initial state. If using random targets for RM-RL, sample a reasonable point goal here.
Some pre-registered gym env can be found at here. For example,
a RM-RL with std measurement-determined reset with random goal environment
from allenact_plugins.sawyer_peg_plugin import *
# Examples of initializing a `Std` measure-determined intervention with random goals for training
env = gym.make("RFSawyerPegRandomStdMeasureVisual-v1")
# or
env = parse_sawyer_peg_env("visual_std_measure")
an episodic environment with random peg box and hole positions for evaluation can be made by
# Examples of initializing an episodic evaluation environment with random peg box and hole positions
env_eval = gym.make("SawyerPegVisual-v1")
# or
env_eval = parse_sawyer_peg_env("visual_eval")
To get the ObjectNav scenes dataset for RoboTHOR
run the following command:
bash datasets/download_navigation_datasets.sh robothor-objectnav
This will download the dataset into datasets/robothor-objectnav
. Full documentation can be found at here. The reset-free/reset-minimized task sampler can be found at here. For example,
task sampler for a RM-RL agent with std measurement-led reset and random targets
import os
import glob
import numpy as np
from allenact_plugins.robothor_plugin.robothor_task_samplers import ObjectNavDatasetTaskSampler
# See experiments wrapper for distributed partitions
dataset_dir = "datasets/robothor-objectnav/train"
scenes_path = os.path.join(dataset_dir, "episode", "*.json.gz")
scenes = [
scene.split("/")[-1].split(".")[0]
for scene in glob.glob(scenes_path)
]
task_sampler = ObjectNavDatasetTaskSampler(
scenes=scenes,
scene_directory=dataset_dir,
max_steps=300,
# False then episodic
reset_free=True,
# False for two-phase FB-RL
measurement_lead_reset=True,
# std, entropy, euclidean, dtw
measure_method="std",
# other periodic resets
num_steps_for_reset=np.inf,
)
Experiment files are under project
directory. In general, run below commands for training:
python main.py \
{experiment_file} \
-b projects/{environment_name}/experiments/{base_folder} \
-o {output_path}
where {experiment_file}
is python file name for the experiment, {environment_name}
is one of objectnav
, saywer_peg
, and strech_manipulation
, {base_folder}
is the base folder for {experiment_file}
, and {output_path}
is output path for saving experiment checkpoints and configurations. See possible files and directories for every experiment below.
Optional:
--callbacks allenact_plugins/callbacks/wandb_callback.py
use wandb logging, where specified your wandb entry and project at each yaml config file undercfg
directory (e.g. here for Stretch-P&P)--valid_on_initial_weights
validation at initial weights before training-a
disable saving the used config in the output directory-i
disable tensorboard logging (which by default save in the output directory)-d
sets CuDNN to deterministic mode-s {seed}
set seed asn
, random seeds without setting-m {n}
set maximal number of sampler processes to spawn for each worker asn
, default set in experiment config.-l debug
set logger level asdebug
.
For evaluation, run:
python main.py \
{test_file} \
-b projects/{environment_name}/experiments/tests/{test_base_folder} \
-o {output_path} \
-e \
-c {path/to/checkpoints} \
--eval
where -e
for deterministic testing and -c
for specifying the checkpoints
folder or a single *.pt
file for evaluation, and --eval
for explicitly setting for running inference pipeline. Same optional extra args as above.
To reproduce the training with our measurement-determined reset with random goals with budget
python main.py \
random_irr_single_combo \
-b projects/strech_manipulation/experiments/budget_measure_ablation \
-o {output_path}
To run evaluation for Pos-OoD, run
python main.py \
measure_random_novel_pos \
-b projects/strech_manipulation/experiments/tests/budget_and_measurement \
-o {output_path} \
-e \
-c {path/to/checkpoints} \
--eval
And the {test_file}
can be measure_random_novel_texture_lightning
for Vis-OoD; measure_random_novel_objects
for Obj-OoD; and measure_random_novel_scene
for All-OoD. You can change the global variable N_COMBO
and MEASURE
for measure name for ablating here.
To reproduce the training with our measurement-determined reset with Std metric, run
python main.py \
visual_std \
-b projects/sawyer_peg/experiments/measure_variations \
-o {output_path}
and replace std
with dtw
, entropy
, or euclidean
in {experiment_file}
for other measurements we proposed.
To run evaluation for novel peg box and hole positions, run
python main.py \
visual_test_novel_box \
-b projects/strech_manipulation/experiments/tests/std \
-o {output_path} \
-e \
-c {path/to/checkpoints} \
--eval
and replace novel_box
with in_domain
for in domain evaluation, and small_table
for evaluating with narrower table.
To reproduce the training with our measurement-determined reset with Std metric, run
python main.py \
std \
-b projects/obejctnav/experiments/rmrl \
-o {output_path}
for evaluation in RoboTHOR validation scene dataset, run
python main.py \
std \
-b projects/obejctnav/experiments/rmrl \
-o {output_path} \
-e \
-c {path/to/checkpoints} \
--eval
The codebase framework is based on AllenAct framework. The Stretch-P&P and RoboTHOR ObjectNav simulated environments are built based on AI2-THOR. The Sawyer Peg simulation is modified from MetaWorld.