Training (hopefully) safe agents in gridworlds.
Emphasizing extensibility, modularity, and accessibility.
safe_grid_agents/common
: Core codebase. Includes abstract base classes for a variety of agents, their associated warmup/learn/eval functions, and a utilities file.main.py
: Python executable for composing training jobs.safe_grid_agents/parsing
: Helpers that construct a flexible CLI formain.py
.safe_grid_agents/ssrl
: Agents that implement semi-supervised reinforcement learning and their associated warmup functions.
When installing with pip, make sure to use the
process-dependency-links
flag:
pip install . --process-dependency-links
URL-based dependencies are available for audit at the following repositories and forks: - safe-grid-gym - ai-safety-gridworlds
If you plan on developing this library, make sure to add an -e
flag to
the above pip install command.
This repo requires tensorboardX for monitoring and visualizing agent learning, as well as PyTorch for implementation of certain agents. Currently, tensorboardX does not function properly without Tensorflow installed. Since the installation process of these packages can vary system to system, we exclude them from our build process. There are multiple tutorials online for installing both of these online. For example, on OS X without CUDA support I'd go with:
# Replace `tensorflow` with `tensorflow-gpu` if you have a GPU.
pip install torch torchvision tensorflow
You can use the CLI to main.py
to modularly drop agents into arbitrary
safety gridworlds. For example, python main.py boat tabular-q --lr .5
will train a TabularQAgent on the BoatRaceEnvironment with a learning
rate of 0.5.
There are a number of customizable parameters to modify training runs.
These parameters are split into three groups: - Core arguments: args
that are shared across all agents/environments. Found in
parsing/core_parser_configs.yaml
.
- Environment arguments: args specific to environments but shared across
agents. Currently empty, but could be useful for specific environments,
depending on the agent. Found in
parsing/env_parser_configs.yaml
. - Agent environments: args specific to agents. Most hyperparameters live
here. Found in
parsing/agent_parser_configs.yaml
.
The generalized form for the CLI is
python main.py <core_args> env <env_args> agent <agent_args>
We support using Ray Tune to configure hyperparameters. Look at
TUNE_DEFAULT_CONFIG
in main.py
to see which are currently supported.
If you specify a tunable parameter on the CLI with the -t
or --tune
flag, it will be automatically set.
This will automatically set parameters for the learning rate lr
and
discount rate discount
.
# `-t` and `--tune` are equivalent, and can be used interchangeably.
python3 main.py -t lr --tune discount boat tabular-q
You can use the --log-dir
/-L
flag to the main.py script to specify a
directory for saving training and evaluation metrics across runs. I
suggest a pattern similar to
logs/sokoban/deep-q/lr5e-4
# that is, <logdir>/<env_alias>/<agent_alias>/<uniqueid_or_hparams>
If no log-dir is specified for main.py, logging defaults to the runs/
directory, which can be helpful to separate debugging runs from training
runs.
Given a log directory <logs>
, simply run tensorboard --logdir <logs>
to visualize an agent's learning.
We use black for auto-formatting
according to a consistent style guide. To auto format, run black .
from inside the repo folder. To make this more convenient, you can
install plugins for your preferred text editor that auto-format on every
save.
Steps to take when adding a new agent.
- Determine where the agent should live; for example, if you're
testing a new baseline from standard RL, include it in
common
, but if you're adding a new SSRL agent, add it tossrl
. We'll refer to this folder as<top>
. - (optional) If your agent doesn't fall into these categories, create
a new top-level subdirectory
<top>
for it (using an informative abbreviation). You should also create an abstract base class establishing the distinguishing functionality of your agent class in<top>/base.py
. For example:- SSRL requires a stronger agent H to learn from, so we require a
query_H
method for each agent. - Additionally, following Everitt et
al., we require a
learn_C
method to learn the probability of the state being corrupt.
- SSRL requires a stronger agent H to learn from, so we require a
- (optional) Implement a warmup function in
<top>/warmup.py
, and make sure it's importable fromcommon/warmup.py
. Thenoop
default warmup function works for agents that don't require any special functionality. - Implement a function describing the agent's learning feedback loop
in
<top>/learn.py
. Seecommon/learn.py
for an example distinguishing DQN from a tabular Q-learning agent. - (optional) Implement a function in
<top>/eval.py
describing the evaluation feedback loop. Thedefault_eval
function incommon/eval.py
should cover most cases, so you may not need to add anything for evaluation. - Add a new entry for the agent's CLI arguments in
parsing/agent_parser_configs.yaml
. Follow the existing pattern and check for previously implemented YAML anchors that cover the arguments you need (e.g.learnrate
,epsilon-anneal
, etc.). These configs should be organized by where they appear in the folder structure of the repository.