Skip to content

Latest commit

 

History

History
62 lines (44 loc) · 2.98 KB

README.md

File metadata and controls

62 lines (44 loc) · 2.98 KB

RL in 200 Lines

PyTorch implementations of Reinforcement Learning algorithms in less than 200 lines.

Algorithms:

  1. Deep Reinforcement Learning

    • DQN
    • Soft Actor-Critic (SAC) [Results]
    • Vanilla Policy Gradient (Actor-Critic) [Results]
    • Proximal Policy Optimization (PPO) [Results]
    • Deep Deterministic Policy Gradient (DDPG) [Results]
  2. Bandits

    • Epsilon Greedy
    • Softmax action selection
    • UCB-1
    • REINFORCE
  3. Classical MDP Control

    • SARSA
    • Q-learning
    • SARSA(lambda)
    • Vanilla Policy Gradient
  4. Additional Resources

    • Report on Bandit algorithms
    • Report on Classical MDP control algorithms
    • Contour environment - gym-contour
    • Puddle world - gym-puddle

Dependencies

  • PyTorch
  • Tensorboard
  • OpenAI Gym
  • Numpy

Usage

  • Clone the repository.
  • Run experiments on an algorithm by running either .py or main.py within its directory.
  • Tensorboard of my experiments can be viewed by using the 'Result' links given above.

References

  • Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, (2018) [bib] by Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel and Sergey Levine

  • Proximal Policy Optimization Algorithms, (2017) [bib] by John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford and Oleg Klimov

  • Benchmarking Deep Reinforcement Learning for Continuous Control, (2016) [bib] by Yan Duan, Xi Chen, Rein Houthooft, John Schulman and Pieter Abbeel

  • Playing Atari with Deep Reinforcement Learning, (2013) [bib] by Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra and Martin A. Riedmiller

  • Using Confidence Bounds for Exploitation-Exploration Trade-offs, (2002) [bib] by Peter Auer

  • Eligibility Traces for Off-Policy Policy Evaluation, (2000) [bib] by Doina Precup, Richard S. Sutton and Satinder P. Singh

  • Policy Gradient Methods for Reinforcement Learning with Function Approximation, (1999) [bib] by Richard S. Sutton, David A. McAllester, Satinder P. Singh and Yishay Mansour

  • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, (1992) [bib] by Ronald J. Williams

  • Q-learning, (1992) [bib] by Chris Watkins and Peter Dayan

  • Deterministic Policy Gradient Algorithms, (2014) [bib] by David Silver, Guy Lever, Nicolas Manfred Otto Heess, Thomas Degris, Daan Wierstra and Martin A. Riedmiller