Contributors:
- Offline RL: A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems, Rafael Figueiredo Prudencio et al, 2022
- DQN: Playing Atari with Deep Reinforcement Learning, V.Mnih et al 2013.
Typical DQN paper
- DQN: Human-level control through deep reinforcement learning, V.Mnih et al 2015, Nature.
Narture DQN, Compared to typical DQN paper, proposes a periodically updated target Q to address instabilities, which is more common today
- DoubleDQN: Deep Reinforcement Learning with Double Q-Learning, H Van Hasselt et al 2016, AAAI.
- DuelingDQN: Dueling Network Architectures for Deep Reinforcement Learning, Z Wang et al 2015.
- PER: Prioritized Experience Replay, T Schaul et al 2015, ICLR.
- Rainbow DQN: Rainbow: Combining Improvements in Deep Reinforcement Learning, M Hessel et al 2017, AAAI.
- DRQN: Deep Recurrent Q-Learning for Partially Observable MDPs, M Hausknecht e al 2015, AAAI.
- Noisy DQN: Noisy Networks for Exploration, M Fortunato et al 2017, ICLR.
- Averaged-DQN: Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning, O Anschel et al 2016, ICML.
- C51: A Distributional Perspective on Reinforcement Learning, 2017, ICML
- Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning,A Nagabandi et al 2017, ICRA.
- Deep Reinforcement Learning and the Deadly Triad, H. V. Hasselt et al 2018.
- REINFORCE: Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al, 1999, NIPS
- A3C: Asynchronous Methods for Deep Reinforcement Learning, Mnig et al, 2016, ICML.
- TRPO: Trust Region Policy Optimization, J. Schulman et al 2015, ICML.
- GAE: High-Dimensional Continuous Control Using Generalized Advantage Estimation, J. Schulman et al 2015, ICLR.
- PPO: Proximal Policy Optimization Algorithms, J. Schulman et al 2017.
Update in small batches, solve the problem that step size in Policy Gradient algorithm is difficult to determine, and KL divergence as Penalty is easier to solve than TRPO
- Distributed PPO: Emergence of Locomotion Behaviours in Rich Environments, N. Heess et al 2017.
- ACKTR: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation, Y Wu et al 2017, NIPS.
- ACER: Sample Efficient Actor-Critic with Experience Replay, Z Wang et al 2016, ICLR.
- DPG: Deterministic Policy Gradient Algorithms, D Silver et al 2014, ICML.
- DDPG: Continuous control with deep reinforcement learning, TP Lillicrap et al 2016, ICLR.
- TD3: Addressing Function Approximation Error in Actor-Critic Methods, S Fujimoto et al 2018, ICML.
- C51: A Distributional Perspective on Reinforcement Learning, MG Bellemare et al 2017, ICML.
- Q-Prop:Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic, S Gu et al 2016, ICLR.
- Action-dependent Control Variates for Policy Optimization via Stein’s Identity, H Liu et al 2017, ICLR.
- The Mirage of Action-Dependent Baselines in Reinforcement Learning, G Tucker et al 2018, ICML.
- PCL:Bridging the Gap Between Value and Policy Based Reinforcement Learning, O Nachum et al 2017, NIPS.
- Trust-PCL:Trust-PCL: An Off-Policy Trust Region Method for Continuous Control, O Nachum et al 2017, CoRR.
- PGQL:Combining Policy Gradient and Q-learning, B O'Donoghue et al 2016, ICLR.
- The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning, A Gruslys et al 2017, ICLR.
- IPG:Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning, S Gu et al 2017, NIPS.
- Equivalence Between Policy Gradients and Soft Q-Learning, J Schulman et al 2017.
- IQN:Implicit Quantile Networks for Distributional Reinforcement Learning, W Dabney et al 2018, ICML.
- Dopamine: A Research Framework for Deep Reinforcement Learning, PS Castro et al 2018.
- VIME:VIME: Variational Information Maximizing Exploration, R Houthooft et al 2017, NIPS.
- Unifying Count-Based Exploration and Intrinsic Motivation, MG Bellemare et al 2016, NIPS.
- Count-Based Exploration with Neural Density Models, G Ostrovski et al 2017, ICML.
- #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning, H Tang et al 2016, NIPS.
- EX2:EX2: Exploration with Exemplar Models for Deep Reinforcement Learning, J Fu et al 2017, NIPS.
- ICM:Curiosity-driven Exploration by Self-supervised Prediction, D Pathak et al 2017, ICML.
- Large-Scale Study of Curiosity-Driven Learning, Y Burda et al 2018, ICLR.
- RND:Exploration by Random Network Distillation, Y Burda et al 2018, ICLR.
-
SAC_V: Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, T Haarnoja et al 2018, ICML.
-
SAC: Soft Actor-Critic Algorithms and Applications , T Haarnoja et al 2018, CoRR
SAC_V suffers from brittleness to the temperature hyperparameter, thus SAC solves it by automatic gradient-based temperature.
- Distributed DQN:Massively Parallel Methods for Deep Reinforcement Learning, A Nair et al 2015.
- Distributed Prioritized Experience Replay, D Horgan et al 2018, ICLR.
- QR-DQN:Distributional Reinforcement Learning with Quantile Regression, W Dabney et al 2017, AAAI.
- REM: An Optimistic Perspective on Offline Reinforcement Learning, Rishabh Agarwal et al 2016.
- AWR: Simple and Scalable Off-Policy Reinforcement Learning, Xue Bin Peng et al 2019, CoRR.
- AWAC: AWAC: Accelerating Online Reinforcement Learning with Offline Datasets, Ashvin Nair et al 2020, CoRR
- TD3+BC: A Minimalist Approach to Offline Reinforcement Learning, Scott Fujimoto et al 2020.
- CQL: Conservative Q-Learning for Offline Reinforcement Learning, Aviral Kumar et al 2020, CoRR.
- IQL: Offline Reinforcement Learning with Implicit Q-Learning, Ilya Kostrikov et al 2021.
*** IRL
- App: Apprenticeship Learning via Inverse Reinforcement Learning, P Abbeel et al 2004.
- Maximum Entropy Inverse Reinforcement Learning, BD Ziebart et al 2008, AAAI.
- Relative Entropy Inverse Reinforcement Learning, A Boularias et al 2011, AISTATS.