You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Did anyone solve the MiniGrid-DoorKey-8x8-v0 environment with the PPO algorithm and if so, with which hyperparameters, environment steps and for how many frames did you run this?
Thanks! :)
Kind regards,
Erik
The text was updated successfully, but these errors were encountered:
I changed the default value of max_steps from 10 x size x size (in doorkey.py) to 100 x size x size, and it works fine. I also increased the number of frames to 800000, but it's obvious that it's learning long before it gets that far. The problem with the default settings is that with a room that large, most of the time it never makes it to the goal, which means there's no reward, so nothing is learned. The main thing is to let it keep trying for long enough in a single episode that it gets rewarded frequently enough to learn.
Hi all,
Did anyone solve the MiniGrid-DoorKey-8x8-v0 environment with the PPO algorithm and if so, with which hyperparameters, environment steps and for how many frames did you run this?
Thanks! :)
Kind regards,
Erik
The text was updated successfully, but these errors were encountered: