Implementation Details

Main Components

The project is structured around 3 main components:

An environment (which is abstracted by the gym.Environment class). The environment receives actions, and outputs states and rewards.
An agent (which is abstracted by our own ReinforcementLearning class). The agent receives states and outputs actions.
A manager (which is abstracted by our own Manager class). The manager coordinates the sending/receiving of actions and states. Managers help with training agents, hyper-parameter optimization, executing episodes, and printing overviews of the environment.

Besides these, there are other important abstractions that the project uses:

LearningStatistics collects different metrics that agents may output during training. It facilitates ways to retrieving the metrics and plotting them. It allows for aggregation on many levels, like model, episode, time-step, environment, and metric.
BaseNetwork is a base class for pytorch nn.Module subclasses, that provides functionality for saving/loading models, enabling GPU, soft parameter updates, freezing weights, plotting the architecture diagram, and running backpropagation.
ExperienceBuffer is an abstraction for agents that use the experience replay technique. It allows for experiences to be stored, sample, and for the buffer to tell if there are enough experiences to start making predictions.

pydeeprecsys.rl.experience_replay.experience_buffer.ExperienceReplayBuffer
pydeeprecsys.rl.experience_replay.priority_replay_buffer.PrioritizedExperienceReplayBuffer

Note: buffer parameters are abstracted in separate classes, to improve code readability.

Note: DecayingEpsilonGreedy can be parametrized to behave as a normal ϵ-greedy method, by setting decay_rate=1.