`pg_methods` organization

This document describes how pg_methods is organized:

`pg_methods.algorithms`

Contains implementations of common algorithms. Right now the following are implemented:

VanillaPolicyGradient contains the implementation of REINFORCE vanilla policy gradient. Baselines are optional and supported.

`pg_methods.data`

Contains various utilities to handle data collection and storage from environments. This should be the future home of experience replay and things like that.

obtain_trajectories Conducts a rollout in the environment
MultiTrajectory Stores rollouts from the environment. Has a .torchify() method to quickly convert the internal things to be used with PyTorch

`pg_methods.interfaces`

This contains some interfaces to go between PyTorch and OpenAI Gym.

Gym has a few Data objects: Box, Discrete etc. There are some utilities to automatically convert between these types and PyTorch tensors. They contain functions like gym2pytorch and pytorch2gym that allows it to work with the PyTorchWrap object.

ContinuousProcessor: Converts between Box datatype and PyTorch
SimpleDiscreteProcessor: Converts the a sample from Discrete into a one hot vector.
OneHotProcessor: Converts the a sample from Discrete into a float that can be fed into PyTorch..

There are some wrappers and parallelized Gym interfaces:

PyTorchWrap Interface between a single gym instance and pytorch
make_parallelized_gym_env Interface between multiple gym environments running in parallel in pytorch.

`pg_methods.networks`

This contains some common neural networks often used as function approximators for policies. Examples are:

MLP_factory: creates a simple MLP
MLP_factory_two_heads: Used to create networks with a shared body and two heads with different parameters.
SharedActorCritic -- (WIP) used for creating actor critic algorithms with shared heads and bodies.

`pg_methods.objectives`

Contains PolicyGradientObjective which actually should be the REINFORCE objective. (Maybe we should consider changing this in a future release?), and NaturalPolicyGradientObjective which is not yet implemented.

`pg_methods.baselines.py`

Right now contains two baseline functions: MovingAverageBaseline, FunctionApproximatorBaseline.

`pg_methods.gradients.py`

Functions to help calculate gradients for the policy gradient objectives. These are all found in PolicyGradientObjective, but a few things that are useful to play with are:

calculate_returns(rewards, discount, masks) calculates retuns given rewards, discount factor and masks. The arguments are usually found by using MultiTrajectory
calculate_policy_gradient_terms(log_probs, advantage) calculates policy gradient terms logprob * advantage (no mean happens here.)

`pg_methods.policies.py`

Includes common policies used in reinforcement learning.

All policies take in a function approximator as the first argument. This is a torch module like a neural network.

`RandomPolicy`

Categorical agent that acts randomly.

`CategoricalPolicy`

`BernoulliPolicy`

`GaussianPolicy`

Actions are picked according to a Gaussian distribution parameterized by mu and sigma. Note that in this case the function approximator should return two outputs corresponding to the mu and sigma

`pg_methods.utils`

pg_methods.utils.experiment: Contains some tools to handle experiments and setup policies quickly
pg_methods.utils.logger: Should contain things to log data. Will be the future home of the Tensorboard logger etc.
pg_methods.utils.plotting: Tools for plotting results of a run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

`pg_methods` organization

`pg_methods.algorithms`

`pg_methods.data`

`pg_methods.interfaces`

`pg_methods.networks`

`pg_methods.objectives`

`pg_methods.baselines.py`

`pg_methods.gradients.py`

`pg_methods.policies.py`

`RandomPolicy`

`CategoricalPolicy`

`BernoulliPolicy`

`GaussianPolicy`

`pg_methods.utils`

Files

README.md

Latest commit

History

README.md

File metadata and controls

pg_methods organization

pg_methods.algorithms

RandomPolicy

CategoricalPolicy

BernoulliPolicy

GaussianPolicy

`pg_methods` organization

`pg_methods.algorithms`

`RandomPolicy`

`CategoricalPolicy`

`BernoulliPolicy`

`GaussianPolicy`