This repository contains implementations of different classic Model-Free Deep Reinforcement Learning algorithms using PyTorch and OpenAI Gym environements. Currently, only vector states environments are supported such as LunarLander-v2
.
The repository is structured as follows:
executor.py
: Training execution scriptutils.py
: Helper functions for executionconfig.py
: Helper functions for configurationdefaults/configs/
: Default JSON configuration files for each algorithmsdrl/
: Main package's directorymodels.py
: PyTorch modelstrainers.py
: Training execution classesoff_policy/
: Off-policy algorithmsagents.py
: Off-policy agentsreplay_buffer.py
: Replay buffer implementation
on_policy/
: On-policy algorithmsagents.py
: On-policy agentsenvs.py
: Multiprocessing environments
Currently implemented algorithms are:
- Deep Q-Learning:
drl.off_policy.agents.DQLAgent
[paper] - Deep Q-Network:
drl.off_policy.agents.DQNAgent
[paper] - Double Deep Q-Netwok:
drl.off_policy.agents.DDQNAgent
[paper] - Reinforce:
drl.on_policy.agents.ReinforceAgent
[paper] - Actor Critic:
drl.on_policy.agents.ActorCriticAgent
[book] - Advantage Actor Critic (A2C) with parallel environements:
drl.on_policy.agents.A2CAgent
[paper]
todo: requirements
The execution is handle by executor.py
.
usage: executor.py [-h] -c CFG
optional arguments:
-h, --help show this help message and exit
-c CFG, --cfg CFG path to config file
Agent, trainer and envrionements are created based on a JSON configuration file passed as an argument. For reference, see defaults/configs/
. For instance, training an A2C agent using defaults/configs/lunarlander-v2.json
is achieved by:
python exectuor.py -c defaults/configs/lunarlander-v2.json
- Fix Dueling architectures
- Continuous action space algorithms: ddpg, ppo, sac, ...
- Streamline API