Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any one who can share model details? #61

Open
Richardxxxxxxx opened this issue Oct 2, 2018 · 3 comments
Open

Any one who can share model details? #61

Richardxxxxxxx opened this issue Oct 2, 2018 · 3 comments

Comments

@Richardxxxxxxx
Copy link

class M1(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 1

class M2(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 4

I use
python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m2
and
python main.py --env_name=Breakout-v0 --is_train=True --display=False --use_gpu=True --model=m1

The "avg_ep_r" in both models reaches 2.1 - 2.3 at around 5 million iterations. But when it comes to even 15 million iterations, the "avg_ep_r" still fluctuates between 2.1 and 2.3.

Just like the result they have shown( I guess that is the result of Action-repeat (frame-skip) of 1, without learning rate decay). I didn't change any parameters.

image

The strange thing is, even when I use model m2(Action-repeat (frame-skip) of 4), my result is similar to model m1.
The "avg_ep_r" fluctuates between 2.1 and 2.3 from around 5 million to 15 million iterations.
The max_ep_r fluctuates between 10 and 18 from around 5 million to 15 million iterations.

class M2(DQNConfig):
backend = 'tf'
env_type = 'detail'
action_repeat = 4

Do I need to change some parameters to reach the best result they have shown?

Thank you very much.

@douglasrizzo
Copy link

Actually, I'd like to know if and how you were able to visualize these graphics. Did you point your tensorboard logdir to a specific directory? Or are you using the images from the README here?

@Richardxxxxxxx
Copy link
Author

Richardxxxxxxx commented Oct 3, 2018

I am using the README. But if you want to combine multiple graphics into one plot. Tensorboard can do that by simply typing:

tensorboard --logdir name1:/path/to/logs/1,name2:/path/to/logs/2

By the way, the image below is my combing graphics

no_action_repeat(red), 4_action_repeat(blue):
board

They all seem to have no significant improvement after 4 million iterations.

Do you have any idea?

@douglasrizzo
Copy link

douglasrizzo commented Oct 3, 2018

I remember some time ago I was trying to figure out where exactly is the logdir that I need to point TensorBoard to when using this repo.

Regarding the performance of your model. I really think that after a few million iterations, you can't squeeze much more performance out of DQN. If you look at the more recent papers, such as Rainbow [1], you'll see that many more complex improvements had to be done to DQN in order for it to show any significant better performance. Also, distributed implementations look very promising, such as A3C [2], Distributed Prioritized Experience Replay [3] and the new R2D2 [4].

The action repeat parameter equals 4 because DQN uses stacks of the four most recent frames to compose its state representation. While the new state representation is being composed, the last action it chose is repeated. However, this fact by itself should not warrant such a similar behavior from both your experiments. I'd guess that the frames don't change that quickly in Atari, which theoretically runs at 60 FPS, so selecting a new action every frame is not that crucial.

One parameter that you could explore is the history length (how many frames compose a state for the neural network). In the DRQN paper [5], DQN was tested with a history length of 4 and 10 frames and they got different results. They also tested using a history of 1, but adding an LSTM layer to the network so that it may model the history using its hidden states and they got better results.

  1. M. Hessel et al., “Rainbow: Combining Improvements in Deep Reinforcement Learning,” CoRR, vol. abs/1710.02298, 2017.
  2. V. Mnih et al., “Asynchronous methods for deep reinforcement learning,” in International Conference on Machine Learning, 2016.
  3. D. Horgan et al., “Distributed prioritized experience replay,” in International Conference for Learning Representations, 2018, p. 19.
  4. “Recurrent Experience Replay in Distributed Reinforcement Learning,” in International Conference for Learning Representations (under review), 2019, p. 15.
  5. M. Hausknecht and P. Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs,” in 2015 AAAI Fall Symposium Series, 2015.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants