easyrl

Thouroughly commented, clear implementation.

Proximal Policy Optimization

RL algorithm where the maximization objective given a state-action pair is the advantage times ratio of the action probability over the old action probability, clipped (paper).

Works with any environment with discrete actions. Works with multiple envs in parallel. Tested on OpenAI Retro's Sonic environment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

easyrl

Proximal Policy Optimization

Files

README.md

Latest commit

History

README.md

File metadata and controls

easyrl

Proximal Policy Optimization