Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 446 Bytes

README.md

File metadata and controls

7 lines (5 loc) · 446 Bytes

easyrl

Thouroughly commented, clear implementation.

Proximal Policy Optimization

RL algorithm where the maximization objective given a state-action pair is the advantage times ratio of the action probability over the old action probability, clipped (paper).

Works with any environment with discrete actions. Works with multiple envs in parallel. Tested on OpenAI Retro's Sonic environment.