-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pure Continuous version of ALE on CPP #550
Pure Continuous version of ALE on CPP #550
Conversation
@pseudo-rnd-thoughts Ready for review. @psc-g Would be great it we can get your review here too. AFAICT, this is the same as your implementation so shouldn't affect the reproducibility of your paper/branch. :) |
@jjshoots could you update the pr description to explain what you have done. There seem to be some core functions that have changed. Could you add example code for interfacing with continuous at cpp and python levels. Also could you add some more continuous specific tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very clean and more efficient than my original implementation, thanks for doing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me as well
for env_id, spec in gymnasium.registry.items() | ||
if spec.entry_point == "ale_py.env:AtariEnv" | ||
], | ||
"env_id,continuous", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future this can be two separate @pytest.mark.parameterize, the first for the env id and the second continuous
This re-implements #539 but without introducing two separate sets of functionality on the CPP end.
In ALE, only the paddles actually accept continuous actions, while games using the joystick are actually pure discrete in the emulator. Therefore, we cannot implement true continuous for those environments.
What this PR does is implement the clipping on the Python end for continuous environments while using the same ActionEnum backbone of the original ALE, plus an argument for paddle strength in paddle-based games.
What do we want?
The changes in code are motivated by one thing: we want continuous action space for paddles only, keeping the ActionEnum-style action inputs for joysticks since the joystick for Atari is still a discrete action (continuous joystick does not exist).
What did the original ALE interface do?
In the original ALE interface, the actions are only joystick ActionEnum inputs. Then, for games that use a paddle instead of a joystick, joystick controls are mapped into discrete actions applied to paddles, ie:
LEFTDOWN
,LEFTUP
,LEFT...
) -> paddle left maxRIGHTDOWN
,RIGHTUP
,RIGHT...
) -> paddle right maxThis results in loss of continuous action for paddles.
How do we allow continuous paddles?
The original interface uses joystick actions for all games to maintain a similar interface for all games, making it easier for RL.
Ideally, we want to maintain this functionality, but allow for continuous action inputs for games that allow paddle usage.
To do that, we modify the interface on the CPP end:
Old Discrete ALE interface
New Mixed Discrete-Continuous ALE interface
Then, for games that utilize paddles, if the paddle strength parameter is set (the default value is 1.0), we pass the paddle action to the underlying game via this change:
This maintains backwards compatibility (it performs exactly the same if
paddle_x_strength
is not applied).For games where the paddle is not used, the
paddle_x_strength
parameter is just ignored. This mirrors the real world scenario where you have a paddle connected, but the game doesn't react to it when the paddle is turned.Python side interface
Old Discrete ALE Python Interface
New Mixed Discrete-Continuous ALE Python Interface
ALE Gymnasium Interface
The main change this PR applies over the original CALE implementation is that the discretization is now handled at the Python level. More specifically, when continuous action space is used, we do this.
More specifically,
self.map_action_idx
is anlru_cache
-ed function that takes the continuous action direction and maps it into an ActionEnum.This implementation is actually slightly faster than the original CALE implementation because don't compute the action repeatedly for every step (the original implementation placed this within the
for _ in range(frame_skip)
loop). We also don't need to worrry thatself.map_action_idx
may be an expensive call because the whole function is cached, so it's basically anO(1)
dictionary lookup.Limitations
One small caveat with this PR (and the original one) is that there are actually games that utilize a paddle and a joystick for control (e.g.: Star Wars and Night Driver). This functionality is not available with this PR, not in #539, and also not in the original ALE. I think it'd be pretty cool to add this in but we probably need to think about how to pass the action space from Stella's end into Python-land for that.