Meta Reinforcement Learning on bandits task.

Simple demo implementation of meta-RL in pytorch

Two variants of meta-learning on bandits are implemented:
- easy: bandits switch between episodes. i.e. for a given episode, the bandit with highest probability is fixed
- medium: bandit switches once at given episode step
- difficult: bandit switches within an episode with probability p.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
scratch mp.ipynb		scratch mp.ipynb
utils.py		utils.py

Provide feedback