You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a bare minimum for thinking a new RL algorithm was possible implemented correctly, it is given a test on the N-armed bandit problem. This environment is about as simple as RL environments get, and so every algorithm should be able to "solve" it w/o problem. This is currently not the case, as some environments (I think just CrossEntropy) do not consistently pass. More care needs to be taken in choosing hyperparameters here so tests aren't flaky.
The text was updated successfully, but these errors were encountered:
As a bare minimum for thinking a new RL algorithm was possible implemented correctly, it is given a test on the N-armed bandit problem. This environment is about as simple as RL environments get, and so every algorithm should be able to "solve" it w/o problem. This is currently not the case, as some environments (I think just
CrossEntropy
) do not consistently pass. More care needs to be taken in choosing hyperparameters here so tests aren't flaky.The text was updated successfully, but these errors were encountered: