The bandit tests are flaky #5

NivenT · 2017-08-03T08:41:10Z

As a bare minimum for thinking a new RL algorithm was possible implemented correctly, it is given a test on the N-armed bandit problem. This environment is about as simple as RL environments get, and so every algorithm should be able to "solve" it w/o problem. This is currently not the case, as some environments (I think just CrossEntropy) do not consistently pass. More care needs to be taken in choosing hyperparameters here so tests aren't flaky.

The text was updated successfully, but these errors were encountered:

NivenT added the help wanted label Aug 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The bandit tests are flaky #5

The bandit tests are flaky #5

NivenT commented Aug 3, 2017

The bandit tests are flaky #5

The bandit tests are flaky #5

Comments

NivenT commented Aug 3, 2017