Pure Continuous version of ALE on CPP #550

jjshoots · 2024-08-01T05:44:23Z

This re-implements #539 but without introducing two separate sets of functionality on the CPP end.

In ALE, only the paddles actually accept continuous actions, while games using the joystick are actually pure discrete in the emulator. Therefore, we cannot implement true continuous for those environments.

What this PR does is implement the clipping on the Python end for continuous environments while using the same ActionEnum backbone of the original ALE, plus an argument for paddle strength in paddle-based games.

What do we want?

The changes in code are motivated by one thing: we want continuous action space for paddles only, keeping the ActionEnum-style action inputs for joysticks since the joystick for Atari is still a discrete action (continuous joystick does not exist).

What did the original ALE interface do?

In the original ALE interface, the actions are only joystick ActionEnum inputs. Then, for games that use a paddle instead of a joystick, joystick controls are mapped into discrete actions applied to paddles, ie:

All left actions (LEFTDOWN, LEFTUP, LEFT...) -> paddle left max
All right actions (RIGHTDOWN, RIGHTUP, RIGHT...) -> paddle right max
Up... etc.
Down... etc.

This results in loss of continuous action for paddles.

How do we allow continuous paddles?

The original interface uses joystick actions for all games to maintain a similar interface for all games, making it easier for RL.
Ideally, we want to maintain this functionality, but allow for continuous action inputs for games that allow paddle usage.
To do that, we modify the interface on the CPP end:

Old Discrete ALE interface

reward_t ALEInterface::act(Action action)

New Mixed Discrete-Continuous ALE interface

reward_t ALEInterface::act(Action action, float paddle_strength = 1.0)

Then, for games that utilize paddles, if the paddle strength parameter is set (the default value is 1.0), we pass the paddle action to the underlying game via this change:

delta_a = static_cast<int>(-PADDLE_DELTA * fabs(paddle_a_strength));

This maintains backwards compatibility (it performs exactly the same if paddle_x_strength is not applied).
For games where the paddle is not used, the paddle_x_strength parameter is just ignored. This mirrors the real world scenario where you have a paddle connected, but the game doesn't react to it when the paddle is turned.

Python side interface

Old Discrete ALE Python Interface

ale.act(action: int)

New Mixed Discrete-Continuous ALE Python Interface

ale.act(action: int, strength: float = 1.0)

ALE Gymnasium Interface

The main change this PR applies over the original CALE implementation is that the discretization is now handled at the Python level. More specifically, when continuous action space is used, we do this.

            x, y = action[0] * np.cos(action[1]), action[0] * np.sin(action[1])
            action_idx = self.map_action_idx(
                left_center_right=(
                    -int(x < self.continuous_action_threshold)
                    + int(x > self.continuous_action_threshold)
                ),
                down_center_up=(
                    -int(y < self.continuous_action_threshold)
                    + int(y > self.continuous_action_threshold)
                ),
                fire=(action[-1] > self.continuous_action_threshold),
            )

More specifically, self.map_action_idx is an lru_cache-ed function that takes the continuous action direction and maps it into an ActionEnum.

This implementation is actually slightly faster than the original CALE implementation because don't compute the action repeatedly for every step (the original implementation placed this within the for _ in range(frame_skip) loop). We also don't need to worrry that self.map_action_idx may be an expensive call because the whole function is cached, so it's basically an O(1) dictionary lookup.

Limitations

One small caveat with this PR (and the original one) is that there are actually games that utilize a paddle and a joystick for control (e.g.: Star Wars and Night Driver). This functionality is not available with this PR, not in #539, and also not in the original ALE. I think it'd be pretty cool to add this in but we probably need to think about how to pass the action space from Stella's end into Python-land for that.

jjshoots · 2024-08-03T17:32:52Z

@pseudo-rnd-thoughts Ready for review.

@psc-g Would be great it we can get your review here too. AFAICT, this is the same as your implementation so shouldn't affect the reproducibility of your paper/branch. :)

pseudo-rnd-thoughts · 2024-08-03T17:46:23Z

@jjshoots could you update the pr description to explain what you have done. There seem to be some core functions that have changed.

Could you add example code for interfacing with continuous at cpp and python levels.

Also could you add some more continuous specific tests

psc-g

very clean and more efficient than my original implementation, thanks for doing this!

src/environment/stella_environment.cpp

pseudo-rnd-thoughts

Looks good to me as well

pseudo-rnd-thoughts · 2024-08-08T16:19:30Z

tests/python/test_atari_env.py

-        for env_id, spec in gymnasium.registry.items()
-        if spec.entry_point == "ale_py.env:AtariEnv"
-    ],
+    "env_id,continuous",


For the future this can be two separate @pytest.mark.parameterize, the first for the env id and the second continuous

step by step

9d3605d

jjshoots marked this pull request as draft August 1, 2024 05:44

jet-sony and others added 26 commits August 1, 2024 14:54

stash, gotta get back to work

78ad5b4

remove discrete implementation and use only continuous

5ae27d9

split the thresholds

b2af342

add thresholds

229f1ed

use true-to-game actions

0355c91

I think... I'm happy with this interface for now

0bc1ea3

amend stella env

998e3e3

remove redundant params

c3a9164

make default parameter

f25a114

amend interface to have default parameter at top level

10903c5

swap parameter order and implement continuous for wrappers

84ca055

maybe stella shouldn't use default params

9aef3d7

move discretization to Python

3a971e8

fix some bugs

cef5892

fix another bug

b9ac273

stash

4189fb8

stash

9dfe314

fix some more bugs

3c9c8aa

ALWAYS the rogue curlies you gotta watch out for

86722e9

streamline

fae19c1

fixing tests

2e3d879

make int

d7f37b7

fix argument

ea133ae

passing tests

444dde3

fix bug

2d86927

use full action space in continuous mode

b58e9b2

jjshoots marked this pull request as ready for review August 3, 2024 17:32

jjshoots added 2 commits August 4, 2024 02:33

precommit

a5df888

fix bug

ea2303f

jjshoots added 3 commits August 4, 2024 02:48

additional warning

394c515

precommit

be3e869

update interface signature

c38033e

psc-g approved these changes Aug 5, 2024

View reviewed changes

src/environment/stella_environment.cpp Outdated Show resolved Hide resolved

change to default emulate strength 1.0

140f95e

pseudo-rnd-thoughts approved these changes Aug 8, 2024

View reviewed changes

pseudo-rnd-thoughts merged commit 11bbfdb into Farama-Foundation:CALE Aug 8, 2024
28 checks passed

jjshoots deleted the jet/pure_cale branch August 9, 2024 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure Continuous version of ALE on CPP #550

Pure Continuous version of ALE on CPP #550

jjshoots commented Aug 1, 2024 •

edited

Loading

jjshoots commented Aug 3, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Aug 3, 2024

psc-g left a comment

pseudo-rnd-thoughts left a comment

pseudo-rnd-thoughts Aug 8, 2024

Pure Continuous version of ALE on CPP #550

Pure Continuous version of ALE on CPP #550

Conversation

jjshoots commented Aug 1, 2024 • edited Loading

What do we want?

What did the original ALE interface do?

How do we allow continuous paddles?

Python side interface

ALE Gymnasium Interface

Limitations

jjshoots commented Aug 3, 2024 • edited Loading

pseudo-rnd-thoughts commented Aug 3, 2024

psc-g left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

pseudo-rnd-thoughts Aug 8, 2024

Choose a reason for hiding this comment

jjshoots commented Aug 1, 2024 •

edited

Loading

jjshoots commented Aug 3, 2024 •

edited

Loading