Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure Continuous version of ALE on CPP #550

Merged
merged 33 commits into from
Aug 8, 2024

Conversation

jjshoots
Copy link
Member

@jjshoots jjshoots commented Aug 1, 2024

This re-implements #539 but without introducing two separate sets of functionality on the CPP end.

In ALE, only the paddles actually accept continuous actions, while games using the joystick are actually pure discrete in the emulator. Therefore, we cannot implement true continuous for those environments.

What this PR does is implement the clipping on the Python end for continuous environments while using the same ActionEnum backbone of the original ALE, plus an argument for paddle strength in paddle-based games.

What do we want?

The changes in code are motivated by one thing: we want continuous action space for paddles only, keeping the ActionEnum-style action inputs for joysticks since the joystick for Atari is still a discrete action (continuous joystick does not exist).

What did the original ALE interface do?

In the original ALE interface, the actions are only joystick ActionEnum inputs. Then, for games that use a paddle instead of a joystick, joystick controls are mapped into discrete actions applied to paddles, ie:

  • All left actions (LEFTDOWN, LEFTUP, LEFT...) -> paddle left max
  • All right actions (RIGHTDOWN, RIGHTUP, RIGHT...) -> paddle right max
  • Up... etc.
  • Down... etc.

This results in loss of continuous action for paddles.

How do we allow continuous paddles?

The original interface uses joystick actions for all games to maintain a similar interface for all games, making it easier for RL.
Ideally, we want to maintain this functionality, but allow for continuous action inputs for games that allow paddle usage.
To do that, we modify the interface on the CPP end:

Old Discrete ALE interface

reward_t ALEInterface::act(Action action)

New Mixed Discrete-Continuous ALE interface

reward_t ALEInterface::act(Action action, float paddle_strength = 1.0)

Then, for games that utilize paddles, if the paddle strength parameter is set (the default value is 1.0), we pass the paddle action to the underlying game via this change:

delta_a = static_cast<int>(-PADDLE_DELTA * fabs(paddle_a_strength));

This maintains backwards compatibility (it performs exactly the same if paddle_x_strength is not applied).
For games where the paddle is not used, the paddle_x_strength parameter is just ignored. This mirrors the real world scenario where you have a paddle connected, but the game doesn't react to it when the paddle is turned.

Python side interface

Old Discrete ALE Python Interface

ale.act(action: int)

New Mixed Discrete-Continuous ALE Python Interface

ale.act(action: int, strength: float = 1.0)

ALE Gymnasium Interface

The main change this PR applies over the original CALE implementation is that the discretization is now handled at the Python level. More specifically, when continuous action space is used, we do this.

            x, y = action[0] * np.cos(action[1]), action[0] * np.sin(action[1])
            action_idx = self.map_action_idx(
                left_center_right=(
                    -int(x < self.continuous_action_threshold)
                    + int(x > self.continuous_action_threshold)
                ),
                down_center_up=(
                    -int(y < self.continuous_action_threshold)
                    + int(y > self.continuous_action_threshold)
                ),
                fire=(action[-1] > self.continuous_action_threshold),
            )

More specifically, self.map_action_idx is an lru_cache-ed function that takes the continuous action direction and maps it into an ActionEnum.

This implementation is actually slightly faster than the original CALE implementation because don't compute the action repeatedly for every step (the original implementation placed this within the for _ in range(frame_skip) loop). We also don't need to worrry that self.map_action_idx may be an expensive call because the whole function is cached, so it's basically an O(1) dictionary lookup.

Limitations

One small caveat with this PR (and the original one) is that there are actually games that utilize a paddle and a joystick for control (e.g.: Star Wars and Night Driver). This functionality is not available with this PR, not in #539, and also not in the original ALE. I think it'd be pretty cool to add this in but we probably need to think about how to pass the action space from Stella's end into Python-land for that.

@jjshoots jjshoots marked this pull request as draft August 1, 2024 05:44
@jjshoots
Copy link
Member Author

jjshoots commented Aug 3, 2024

@pseudo-rnd-thoughts Ready for review.

@psc-g Would be great it we can get your review here too. AFAICT, this is the same as your implementation so shouldn't affect the reproducibility of your paper/branch. :)

@jjshoots jjshoots marked this pull request as ready for review August 3, 2024 17:32
@pseudo-rnd-thoughts
Copy link
Member

@jjshoots could you update the pr description to explain what you have done. There seem to be some core functions that have changed.

Could you add example code for interfacing with continuous at cpp and python levels.

Also could you add some more continuous specific tests

Copy link
Contributor

@psc-g psc-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very clean and more efficient than my original implementation, thanks for doing this!

src/environment/stella_environment.cpp Outdated Show resolved Hide resolved
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well

for env_id, spec in gymnasium.registry.items()
if spec.entry_point == "ale_py.env:AtariEnv"
],
"env_id,continuous",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future this can be two separate @pytest.mark.parameterize, the first for the env id and the second continuous

@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit 11bbfdb into Farama-Foundation:CALE Aug 8, 2024
28 checks passed
@jjshoots jjshoots deleted the jet/pure_cale branch August 9, 2024 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants