Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous ALE #549

Merged
merged 4 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ jobs:

- name: Build
# wildcarding doesn't work for some reason, therefore, update the project version here
run: python -m pip install ale_py-0.9.1-${{ matrix.wheel-name }}.whl
run: python -m pip install ale_py-0.10.0-${{ matrix.wheel-name }}.whl

- name: Install Gymnasium and pytest
run: python -m pip install gymnasium>=1.0.0a2 pytest
Expand Down
65 changes: 65 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,71 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 0.10.0 -

Previously in the original ALE interface, the actions are only joystick ActionEnum inputs.
Then, for games that use a paddle instead of a joystick, joystick controls are mapped into discrete actions applied to paddles, ie:
- All left actions (`LEFTDOWN`, `LEFTUP`, `LEFT...`) -> paddle left max
- All right actions (`RIGHTDOWN`, `RIGHTUP`, `RIGHT...`) -> paddle right max
- Up... etc.
- Down... etc.

This results in loss of continuous action for paddles.
This change keeps this functionality and interface, but allows for continuous action inputs for games that allow paddle usage.

To do that, the CPP interface has been modified.

_Old Discrete ALE interface_
```cpp
reward_t ALEInterface::act(Action action)
```

_New Mixed Discrete-Continuous ALE interface_
```cpp
reward_t ALEInterface::act(Action action, float paddle_strength = 1.0)
```

Games where the paddle is not used simply have the `paddle_strength` parameter ignored.
This mirrors the real world scenario where you have a paddle connected, but the game doesn't react to it when the paddle is turned.
This maintains backwards compatibility.

The Python interface has also been updated.

_Old Discrete ALE Python Interface_
```py
ale.act(action: int)
```

_New Mixed Discrete-Continuous ALE Python Interface_
```py
ale.act(action: int, strength: float = 1.0)
```

More specifically, when continuous action space is used within an ALE gymnasium environment, discretization happens at the Python level.
```py
if continuous:
# action is expected to be a [2,] array of floats
x, y = action[0] * np.cos(action[1]), action[0] * np.sin(action[1])
action_idx = self.map_action_idx(
left_center_right=(
-int(x < self.continuous_action_threshold)
+ int(x > self.continuous_action_threshold)
),
down_center_up=(
-int(y < self.continuous_action_threshold)
+ int(y > self.continuous_action_threshold)
),
fire=(action[-1] > self.continuous_action_threshold),
)
ale.act(action_idx, action[1])
```

More specifically, [`self.map_action_idx`](https://github.com/Farama-Foundation/Arcade-Learning-Environment/pull/550/files#diff-057906329e72d689f1d4d9d9e3f80df11ffe74da581b29b3838a436e90841b5cR388-R447) is an `lru_cache`-ed function that takes the continuous action direction and maps it into an ActionEnum.

## 0.9.1 -

Added support for Numpy 2.0.

## [0.9.0] - 2024-05-10

Previously, ALE implemented only a [Gym](https://github.com/openai/gym) based environment, however, as Gym is no longer maintained (last commit was 18 months ago). We have updated `ale-py` to use [Gymnasium](http://github.com/farama-Foundation/gymnasium) (a maintained fork of Gym) as the sole backend environment implementation. For more information on Gymnasium’s API, see their [introduction page](https://gymnasium.farama.org/main/introduction/basic_usage/).
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ authors = [
{name = "Michael Bowling"},
]
maintainers = [
{ name = "Farama Foundation", email = "[email protected]" },
{name = "Farama Foundation", email = "[email protected]"},
{name = "Jesse Farebrother", email = "[email protected]"},
]
classifiers = [
Expand Down
5 changes: 3 additions & 2 deletions src/ale_interface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -259,8 +259,9 @@ int ALEInterface::lives() {
// user's responsibility to check if the game has ended and reset
// when necessary - this method will keep pressing buttons on the
// game over screen.
reward_t ALEInterface::act(Action action) {
return environment->act(action, PLAYER_B_NOOP);
// Intentionally set player B actions to 0 since we are in single player mode
reward_t ALEInterface::act(Action action, float paddle_strength) {
return environment->act(action, PLAYER_B_NOOP, paddle_strength, 0.0);
}

// Returns the vector of modes available for the current game.
Expand Down
2 changes: 1 addition & 1 deletion src/ale_interface.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ class ALEInterface {
// user's responsibility to check if the game has ended and reset
// when necessary - this method will keep pressing buttons on the
// game over screen.
reward_t act(Action action);
reward_t act(Action action, float paddle_strength = 1.0);

// Indicates if the game has ended.
bool game_over(bool with_truncation = true) const;
Expand Down
68 changes: 16 additions & 52 deletions src/environment/ale_state.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "environment/ale_state.hpp"

#include <cassert>
#include <cmath>
#include <sstream>
#include <stdexcept>
#include <string>
Expand Down Expand Up @@ -188,24 +189,22 @@ void ALEState::updatePaddlePositions(Event* event, int delta_left,
setPaddles(event, m_left_paddle, m_right_paddle);
}

void ALEState::applyActionPaddles(Event* event, int player_a_action,
int player_b_action) {
void ALEState::applyActionPaddles(Event* event,
int player_a_action, float paddle_a_strength,
int player_b_action, float paddle_b_strength) {
// Reset keys
resetKeys(event);

// First compute whether we should increase or decrease the paddle position
// (for both left and right players)
int delta_left;
int delta_right;

int delta_a = 0;
int delta_b = 0;
switch (player_a_action) {
case PLAYER_A_RIGHT:
case PLAYER_A_RIGHTFIRE:
case PLAYER_A_UPRIGHT:
case PLAYER_A_DOWNRIGHT:
case PLAYER_A_UPRIGHTFIRE:
case PLAYER_A_DOWNRIGHTFIRE:
delta_left = -PADDLE_DELTA;
delta_a = static_cast<int>(-PADDLE_DELTA * fabs(paddle_a_strength));
break;

case PLAYER_A_LEFT:
Expand All @@ -214,10 +213,10 @@ void ALEState::applyActionPaddles(Event* event, int player_a_action,
case PLAYER_A_DOWNLEFT:
case PLAYER_A_UPLEFTFIRE:
case PLAYER_A_DOWNLEFTFIRE:
delta_left = PADDLE_DELTA;
delta_a = static_cast<int>(PADDLE_DELTA * fabs(paddle_a_strength));
break;

default:
delta_left = 0;
break;
}

Expand All @@ -228,7 +227,7 @@ void ALEState::applyActionPaddles(Event* event, int player_a_action,
case PLAYER_B_DOWNRIGHT:
case PLAYER_B_UPRIGHTFIRE:
case PLAYER_B_DOWNRIGHTFIRE:
delta_right = -PADDLE_DELTA;
delta_b = static_cast<int>(-PADDLE_DELTA * fabs(paddle_b_strength));
break;

case PLAYER_B_LEFT:
Expand All @@ -237,15 +236,15 @@ void ALEState::applyActionPaddles(Event* event, int player_a_action,
case PLAYER_B_DOWNLEFT:
case PLAYER_B_UPLEFTFIRE:
case PLAYER_B_DOWNLEFTFIRE:
delta_right = PADDLE_DELTA;
delta_b = static_cast<int>(PADDLE_DELTA * fabs(paddle_b_strength));
break;

default:
delta_right = 0;
break;
}

// Now update the paddle positions
updatePaddlePositions(event, delta_left, delta_right);
updatePaddlePositions(event, delta_a, delta_b);

// Handle reset
if (player_a_action == RESET || player_b_action == RESET)
Expand Down Expand Up @@ -301,188 +300,153 @@ void ALEState::setDifficultySwitches(Event* event, unsigned int value) {
event->set(Event::ConsoleRightDifficultyB, !((value & 2) >> 1));
}

void ALEState::setActionJoysticks(Event* event, int player_a_action,
int player_b_action) {
void ALEState::applyActionJoysticks(Event* event,
int player_a_action, int player_b_action) {
// Reset keys
resetKeys(event);

switch (player_a_action) {
case PLAYER_A_NOOP:
break;

case PLAYER_A_FIRE:
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_UP:
event->set(Event::JoystickZeroUp, 1);
break;

case PLAYER_A_RIGHT:
event->set(Event::JoystickZeroRight, 1);
break;

case PLAYER_A_LEFT:
event->set(Event::JoystickZeroLeft, 1);
break;

case PLAYER_A_DOWN:
event->set(Event::JoystickZeroDown, 1);
break;

case PLAYER_A_UPRIGHT:
event->set(Event::JoystickZeroUp, 1);
event->set(Event::JoystickZeroRight, 1);
break;

case PLAYER_A_UPLEFT:
event->set(Event::JoystickZeroUp, 1);
event->set(Event::JoystickZeroLeft, 1);
break;

case PLAYER_A_DOWNRIGHT:
event->set(Event::JoystickZeroDown, 1);
event->set(Event::JoystickZeroRight, 1);
break;

case PLAYER_A_DOWNLEFT:
event->set(Event::JoystickZeroDown, 1);
event->set(Event::JoystickZeroLeft, 1);
break;

case PLAYER_A_UPFIRE:
event->set(Event::JoystickZeroUp, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_RIGHTFIRE:
event->set(Event::JoystickZeroRight, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_LEFTFIRE:
event->set(Event::JoystickZeroLeft, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_DOWNFIRE:
event->set(Event::JoystickZeroDown, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_UPRIGHTFIRE:
event->set(Event::JoystickZeroUp, 1);
event->set(Event::JoystickZeroRight, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_UPLEFTFIRE:
event->set(Event::JoystickZeroUp, 1);
event->set(Event::JoystickZeroLeft, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_DOWNRIGHTFIRE:
event->set(Event::JoystickZeroDown, 1);
event->set(Event::JoystickZeroRight, 1);
event->set(Event::JoystickZeroFire, 1);
break;

case PLAYER_A_DOWNLEFTFIRE:
event->set(Event::JoystickZeroDown, 1);
event->set(Event::JoystickZeroLeft, 1);
event->set(Event::JoystickZeroFire, 1);
break;
case RESET:
event->set(Event::ConsoleReset, 1);
Logger::Info << "Sending Reset...\n";
break;
default:
Logger::Error << "Invalid Player A Action: " << player_a_action << "\n";
std::exit(-1);
}

switch (player_b_action) {
case PLAYER_B_NOOP:
break;

case PLAYER_B_FIRE:
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_UP:
event->set(Event::JoystickOneUp, 1);
break;

case PLAYER_B_RIGHT:
event->set(Event::JoystickOneRight, 1);
break;

case PLAYER_B_LEFT:
event->set(Event::JoystickOneLeft, 1);
break;

case PLAYER_B_DOWN:
event->set(Event::JoystickOneDown, 1);
break;

case PLAYER_B_UPRIGHT:
event->set(Event::JoystickOneUp, 1);
event->set(Event::JoystickOneRight, 1);
break;

case PLAYER_B_UPLEFT:
event->set(Event::JoystickOneUp, 1);
event->set(Event::JoystickOneLeft, 1);
break;

case PLAYER_B_DOWNRIGHT:
event->set(Event::JoystickOneDown, 1);
event->set(Event::JoystickOneRight, 1);
break;

case PLAYER_B_DOWNLEFT:
event->set(Event::JoystickOneDown, 1);
event->set(Event::JoystickOneLeft, 1);
break;

case PLAYER_B_UPFIRE:
event->set(Event::JoystickOneUp, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_RIGHTFIRE:
event->set(Event::JoystickOneRight, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_LEFTFIRE:
event->set(Event::JoystickOneLeft, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_DOWNFIRE:
event->set(Event::JoystickOneDown, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_UPRIGHTFIRE:
event->set(Event::JoystickOneUp, 1);
event->set(Event::JoystickOneRight, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_UPLEFTFIRE:
event->set(Event::JoystickOneUp, 1);
event->set(Event::JoystickOneLeft, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_DOWNRIGHTFIRE:
event->set(Event::JoystickOneDown, 1);
event->set(Event::JoystickOneRight, 1);
event->set(Event::JoystickOneFire, 1);
break;

case PLAYER_B_DOWNLEFTFIRE:
event->set(Event::JoystickOneDown, 1);
event->set(Event::JoystickOneLeft, 1);
Expand Down
Loading
Loading