Skip to content

Latest commit

 

History

History
13 lines (7 loc) · 1.71 KB

2005.05960.md

File metadata and controls

13 lines (7 loc) · 1.71 KB

Planning to Explore via Self-Supervised World Models, Sekar et al., 2020

Paper, Website, Tags: #reinforcement-learning

Plan2Explore: self-supervised RL agent with a new approach towards self-supervised exploration, efficiently exploring visual environments without rewards, and fast adaptation to new tasks. During exploration, our agent acts efficiently by leveraging planning to seek out expected future novelty. After exploration, the agent quickly adapts to multiple downstream tasks in a zero or a few-shot manner. Plan2Explore achieves state-of-the-art zero-shot and adaptation performance.

Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.

The dominant approach in sensorimotor control is to train the agent on 1+ pre-specified tasks with rewards. But learning each task from scratch is inefficient, requiring many task-specific environment interaction for each task. How can an agent quickly generalize to unseen tasks it has never experienced before in a zero or few-shot manner?

In this work, we explore agnostic RL, we explore the environment once without reward to collect a diverse dataset. Later we solve downstream tasks, the agent is given downstream reward functions.

The challenges for planning to explore are to train an accurate world model from high-dimensional inputs, and to define an effective exploration objective. An ideal exploration objective should seek out inputs that the agent can learn the most from (epistemic uncertainty), while being robust to stochastic parts of the environment that can't be learnt accurately (aleatoric uncertainty).