-
Notifications
You must be signed in to change notification settings - Fork 763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any one who can share model details? #61
Comments
Actually, I'd like to know if and how you were able to visualize these graphics. Did you point your tensorboard logdir to a specific directory? Or are you using the images from the README here? |
I remember some time ago I was trying to figure out where exactly is the logdir that I need to point TensorBoard to when using this repo. Regarding the performance of your model. I really think that after a few million iterations, you can't squeeze much more performance out of DQN. If you look at the more recent papers, such as Rainbow [1], you'll see that many more complex improvements had to be done to DQN in order for it to show any significant better performance. Also, distributed implementations look very promising, such as A3C [2], Distributed Prioritized Experience Replay [3] and the new R2D2 [4]. The action repeat parameter equals 4 because DQN uses stacks of the four most recent frames to compose its state representation. While the new state representation is being composed, the last action it chose is repeated. However, this fact by itself should not warrant such a similar behavior from both your experiments. I'd guess that the frames don't change that quickly in Atari, which theoretically runs at 60 FPS, so selecting a new action every frame is not that crucial. One parameter that you could explore is the history length (how many frames compose a state for the neural network). In the DRQN paper [5], DQN was tested with a history length of 4 and 10 frames and they got different results. They also tested using a history of 1, but adding an LSTM layer to the network so that it may model the history using its hidden states and they got better results.
|
class M1(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 1
class M2(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 4
I use
python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m2
and
python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m1
The "avg_ep_r" in both models reaches 2.1 - 2.3 at around 5 million iterations. But when it comes to even 15 million iterations, the "avg_ep_r" still fluctuates between 2.1 and 2.3.
Just like the result they have shown( I guess that is the result of Action-repeat (frame-skip) of 1, without learning rate decay). I didn't change any parameters.
The strange thing is, even when I use model m2(Action-repeat (frame-skip) of 4), my result is similar to model m1.
The "avg_ep_r" fluctuates between 2.1 and 2.3 from around 5 million to 15 million iterations.
The max_ep_r fluctuates between 10 and 18 from around 5 million to 15 million iterations.
class M2(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 4
Do I need to change some parameters to reach the best result they have shown?
Thank you very much.
The text was updated successfully, but these errors were encountered: