Skip to content

Commit

Permalink
update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kygguo committed Oct 11, 2022
1 parent 9e1f95e commit cce61a5
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ Safe exploration is a challenging and important problem in model-free reinforcem
## [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](./PMDB)

Code associdated to: [Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief](https://nips.cc/Conferences/2022/Schedule?showEvent=54842) accepted
at **NeurIPS22** conference..
at **NeurIPS22** conference.

#### Abstract
Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously
Expand All @@ -183,7 +183,7 @@ through reward penalty may incur unexpected tradeoff between model utilization a
instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the
belief. The sampling procedure, biased towards pessimism, is derived based on an alternating Markov game formulation
of offline RL. We formally show that the biased sampling naturally induces an updated dynamics belief with
policy-dependent reweighting factor, termed \emph{Pessimism-Modulated Dynamics Belief}. To improve policy, we devise an
policy-dependent reweighting factor, termed *Pessimism-Modulated Dynamics Belief*. To improve policy, we devise an
iterative regularized policy optimization algorithm for the game, with guarantee of monotonous improvement under certain
condition. To make practical, we further devise an offline RL algorithm to approximately find the solution. Empirical
results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks.
Expand Down

0 comments on commit cce61a5

Please sign in to comment.