Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 847 Bytes

README.md

File metadata and controls

24 lines (17 loc) · 847 Bytes

Meta Reinforcement Learning on bandits task.

Simple demo implementation of meta-RL in pytorch

Repo materials

  • utils.py defines the agent and task environment
  • main.ipynb has a simple training loop, and an evaluation of the learnt agent

Task

  • Two variants of meta-learning on bandits are implemented:
    • easy: bandits switch between episodes. i.e. for a given episode, the bandit with highest probability is fixed
    • medium: bandit switches once at given episode step
    • difficult: bandit switches within an episode with probability p.

Agent

  • policy gradient method
    • target can be REINFORCE (MC) or ActorCritic (TD)

Optimizations

  • forward layers computed in parallel by folding time into batch dimension

Resources

Wang et al., 2018 paper