Bayesian Reward Shaping Ensemble Framework for Deep Reinforcement Learning.
This small and fairly self-contained (see prerequisites below) package accompanies an article published in Advances in Neural Information Processing Systems (NeurIPS) entitled "Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach" in December of 2018.
This packages provides an online and efficient Bayesian ensemble algorithm for potential-based reward shaping by combining multiple experts (potential function).
Tested on Python 3.5 with standard packages (e.g. numpy) and the following additional packages:
- Keras with tensorflow backend
- OpenAI Gym for the Cartpole implementation
- pyprind for tracking progress of convergence
The new version includes the following bug fixes:
- Fixed a critical error in the training loop (the Monte Carlo estimate for updating the posterior over experts was computed in the incorrect order)
- Added Double DQN agent
- Added arguments for learning rate decay for tabular methods
- Removed jupyter notebooks containing outdated experiments.
To cite the framework:
title={Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach},
author={Gimelfarb, Michael and Sanner, Scott and Lee, Chi-Guhn},
booktitle={Advances in Neural Information Processing Systems},
year={2018} }
- [1] Gimelfarb, Michael, Scott Sanner, and Chi-Guhn Lee. "Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach." Advances in Neural Information Processing Systems. 2018.
- [2] Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep Reinforcement Learning with Double Q-Learning." AAAI. Vol. 2. 2016.
- [3] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
- [4] Ng, Andrew Y., Daishi Harada, and Stuart Russell. "Policy invariance under reward transformations: Theory and application to reward shaping." ICML. Vol. 99. 1999.
- [5] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 1998.