Thouroughly commented, clear implementation.
RL algorithm where the maximization objective given a state-action pair is the advantage times ratio of the action probability over the old action probability, clipped (paper).
Works with any environment with discrete actions. Works with multiple envs in parallel. Tested on OpenAI Retro's Sonic environment.