Skip to content

Commit

Permalink
Merge branch 'develop' into toni/pre-commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Toni-SM committed Nov 4, 2024
2 parents 6662e70 + eff7295 commit 12bcd22
Show file tree
Hide file tree
Showing 127 changed files with 2,364 additions and 1,793 deletions.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ body:
description: The skrl version can be obtained with the command `pip show skrl`.
options:
- ---
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
Expand Down
26 changes: 23 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,26 @@

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [1.4.0] - Unreleased
### Added
- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
- `parse_device` static method in ML framework configuration for JAX

### Changed
- Call agent's `pre_interaction` method during evaluation
- Use spaces utilities to process states, observations and actions for all the library components
- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9).

### Fixed
- Moved the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)

### Removed
- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
it is just not installed as part of the library. If it is needed, it needs to be installed manually.
Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate

## [1.3.0] - 2024-09-11
### Added
- Distributed multi-GPU and multi-node learning (JAX implementation)
Expand Down Expand Up @@ -70,7 +90,7 @@ Summary of the most relevant features:
## [1.0.0-rc.2] - 2023-08-11
### Added
- Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
- Time-limit (truncation) boostrapping in on-policy actor-critic agents
- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
- Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value

### Changed (breaking changes)
Expand All @@ -84,7 +104,7 @@ Summary of the most relevant features:
- `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`

### Changed
- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)

## [1.0.0-rc.1] - 2023-07-25
### Added
Expand Down Expand Up @@ -177,7 +197,7 @@ to allow storing samples in memories during evaluation
- Parameter `role` to model methods
- Wrapper compatibility with the new OpenAI Gym environment API
- Internal library colored logger
- Migrate checkpoints/models from other RL libraries to skrl models/agents
- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
- Configuration parameter `store_separately` to agent configuration dict
- Save/load agent modules (models, optimizers, preprocessors)
- Set random seed and configure deterministic behavior for reproducibility
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
```ini
function annotation (e.g. typing)
# insert an empty line
python libraries and other libraries (e.g. gym, numpy, time, etc.)
python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
# insert an empty line
machine learning framework modules (e.g. torch, torch.nn)
# insert an empty line
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api/agents/ddqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ Learning algorithm

|
| :literal:`_update(...)`
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| :green:`# gradient steps`
| **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| :green:`# compute target values`
| :math:`Q' \leftarrow Q_{\phi_{target}}(s')`
| :math:`Q_{_{target}} \leftarrow Q'[\underset{a}{\arg\max} \; Q_\phi(s')] \qquad` :gray:`# the only difference with DQN`
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api/agents/dqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,10 @@ Learning algorithm

|
| :literal:`_update(...)`
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| :green:`# gradient steps`
| **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| :green:`# compute target values`
| :math:`Q' \leftarrow Q_{\phi_{target}}(s')`
| :math:`Q_{_{target}} \leftarrow \underset{a}{\max} \; Q' \qquad` :gray:`# the only difference with DDQN`
Expand Down
4 changes: 2 additions & 2 deletions docs/source/api/agents/sac.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ Learning algorithm

|
| :literal:`_update(...)`
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| :green:`# gradient steps`
| **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
| :green:`# sample a batch from memory`
| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
| :green:`# compute target values`
| :math:`a',\; logp' \leftarrow \pi_\theta(s')`
| :math:`Q_{1_{target}} \leftarrow Q_{{\phi 1}_{target}}(s', a')`
Expand Down
2 changes: 2 additions & 0 deletions docs/source/api/config/frameworks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,8 @@ API

The default device, unless specified, is ``cuda:0`` (or ``cuda:JAX_LOCAL_RANK`` in a distributed environment) if CUDA is available, ``cpu`` otherwise

.. autofunction:: skrl.config.jax.parse_device

.. py:data:: skrl.config.jax.backend
:type: str
:value: "numpy"
Expand Down
4 changes: 4 additions & 0 deletions docs/source/api/utils.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Utils and configurations

ML frameworks configuration <config/frameworks>
Random seed <utils/seed>
Spaces <utils/spaces>
Model instantiators <utils/model_instantiators>
Runner <utils/runner>
Distributed runs <utils/distributed>
Expand Down Expand Up @@ -39,6 +40,9 @@ A set of utilities and configurations for managing an RL setup is provided as pa
* - :doc:`Random seed <utils/seed>`
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\blacksquare`
* - :doc:`Spaces <utils/spaces>`
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\blacksquare`
* - :doc:`Model instantiators <utils/model_instantiators>`
- .. centered:: :math:`\blacksquare`
- .. centered:: :math:`\blacksquare`
Expand Down
86 changes: 86 additions & 0 deletions docs/source/api/utils/spaces.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
Spaces
======

Utilities to operate on Gymnasium `spaces <https://gymnasium.farama.org/api/spaces>`_.

.. raw:: html

<br><hr>

Overview
--------

The utilities described in this section supports the following Gymnasium spaces:

.. list-table::
:header-rows: 1

* - Type
- Supported spaces
* - Fundamental
- :py:class:`~gymnasium.spaces.Box`, :py:class:`~gymnasium.spaces.Discrete`, and :py:class:`~gymnasium.spaces.MultiDiscrete`
* - Composite
- :py:class:`~gymnasium.spaces.Dict` and :py:class:`~gymnasium.spaces.Tuple`

The following table provides a snapshot of the space sample conversion functions:

.. list-table::
:header-rows: 1

* - Input
- Function
- Output
* - Space (NumPy / int)
- :py:func:`~skrl.utils.spaces.torch.tensorize_space`
- Space (PyTorch / JAX)
* - Space (PyTorch / JAX)
- :py:func:`~skrl.utils.spaces.torch.untensorize_space`
- Space (NumPy / int)
* - Space (PyTorch / JAX)
- :py:func:`~skrl.utils.spaces.torch.flatten_tensorized_space`
- PyTorch tensor / JAX array
* - PyTorch tensor / JAX array
- :py:func:`~skrl.utils.spaces.torch.unflatten_tensorized_space`
- Space (PyTorch / JAX)

.. raw:: html

<br>

API (PyTorch)
-------------

.. autofunction:: skrl.utils.spaces.torch.compute_space_size

.. autofunction:: skrl.utils.spaces.torch.convert_gym_space

.. autofunction:: skrl.utils.spaces.torch.flatten_tensorized_space

.. autofunction:: skrl.utils.spaces.torch.sample_space

.. autofunction:: skrl.utils.spaces.torch.tensorize_space

.. autofunction:: skrl.utils.spaces.torch.unflatten_tensorized_space

.. autofunction:: skrl.utils.spaces.torch.untensorize_space

.. raw:: html

<br>

API (JAX)
---------

.. autofunction:: skrl.utils.spaces.jax.compute_space_size

.. autofunction:: skrl.utils.spaces.jax.convert_gym_space

.. autofunction:: skrl.utils.spaces.jax.flatten_tensorized_space

.. autofunction:: skrl.utils.spaces.jax.sample_space

.. autofunction:: skrl.utils.spaces.jax.tensorize_space

.. autofunction:: skrl.utils.spaces.jax.unflatten_tensorized_space

.. autofunction:: skrl.utils.spaces.jax.untensorize_space
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ Utils and configurations

* :doc:`ML frameworks <api/config/frameworks>` configuration
* :doc:`Random seed <api/utils/seed>`
* :doc:`Spaces <api/utils/spaces>`
* :doc:`Model instantiators <api/utils/model_instantiators>`
* :doc:`Runner <api/utils/runner>`
* :doc:`Distributed runs <api/utils/distributed>`
Expand Down
6 changes: 3 additions & 3 deletions docs/source/intro/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ In this section, you will find the steps to install the library, troubleshoot kn

**skrl** requires Python 3.6 or higher and the following libraries (they will be installed automatically):

* `gym <https://www.gymlibrary.dev>`_ / `gymnasium <https://gymnasium.farama.org/>`_
* `tqdm <https://tqdm.github.io>`_
* `gymnasium <https://gymnasium.farama.org/>`_
* `packaging <https://packaging.pypa.io>`_
* `tensorboard <https://www.tensorflow.org/tensorboard>`_
* `tqdm <https://tqdm.github.io>`_

Machine learning (ML) framework
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand All @@ -25,7 +25,7 @@ According to the specific ML frameworks, the following libraries are required:
PyTorch
"""""""

* `torch <https://pytorch.org>`_ 1.9.0 or higher
* `torch <https://pytorch.org>`_ 1.10.0 or higher

JAX
"""
Expand Down
20 changes: 10 additions & 10 deletions docs/source/snippets/agent.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# [start-agent-base-class-torch]
from typing import Union, Tuple, Dict, Any, Optional

import gym, gymnasium
import gymnasium
import copy

import torch
Expand Down Expand Up @@ -33,8 +33,8 @@ class CUSTOM(Agent):
def __init__(self,
models: Dict[str, Model],
memory: Optional[Union[Memory, Tuple[Memory]]] = None,
observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
observation_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
action_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
device: Optional[Union[str, torch.device]] = None,
cfg: Optional[dict] = None) -> None:
"""Custom agent
Expand All @@ -46,9 +46,9 @@ def __init__(self,
for the rest only the environment transitions will be added
:type memory: skrl.memory.torch.Memory, list of skrl.memory.torch.Memory or None
:param observation_space: Observation/state space or shape (default: None)
:type observation_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
:type observation_space: int, tuple or list of integers, gymnasium.Space or None, optional
:param action_space: Action space or shape (default: None)
:type action_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
:type action_space: int, tuple or list of integers, gymnasium.Space or None, optional
:param device: Device on which a torch tensor is or will be allocated (default: ``None``).
If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
:type device: str or torch.device, optional
Expand Down Expand Up @@ -179,7 +179,7 @@ def _update(self, timestep: int, timesteps: int) -> None:
# [start-agent-base-class-jax]
from typing import Union, Tuple, Dict, Any, Optional

import gym, gymnasium
import gymnasium
import copy

import jaxlib
Expand Down Expand Up @@ -213,8 +213,8 @@ class CUSTOM(Agent):
def __init__(self,
models: Dict[str, Model],
memory: Optional[Union[Memory, Tuple[Memory]]] = None,
observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
observation_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
action_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
device: Optional[Union[str, jaxlib.xla_extension.Device]] = None,
cfg: Optional[dict] = None) -> None:
"""Custom agent
Expand All @@ -226,9 +226,9 @@ def __init__(self,
for the rest only the environment transitions will be added
:type memory: skrl.memory.jax.Memory, list of skrl.memory.jax.Memory or None
:param observation_space: Observation/state space or shape (default: None)
:type observation_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
:type observation_space: int, tuple or list of integers, gymnasium.Space or None, optional
:param action_space: Action space or shape (default: None)
:type action_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
:type action_space: int, tuple or list of integers, gymnasium.Space or None, optional
:param device: Device on which a jax array is or will be allocated (default: ``None``).
If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
:type device: str or jaxlib.xla_extension.Device, optional
Expand Down
20 changes: 10 additions & 10 deletions docs/source/snippets/model_mixin.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# [start-model-torch]
from typing import Optional, Union, Mapping, Sequence, Tuple, Any

import gym, gymnasium
import gymnasium

import torch

Expand All @@ -10,17 +10,17 @@

class CustomModel(Model):
def __init__(self,
observation_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
action_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
observation_space: Union[int, Sequence[int], gymnasium.Space],
action_space: Union[int, Sequence[int], gymnasium.Space],
device: Optional[Union[str, torch.device]] = None) -> None:
"""Custom model
:param observation_space: Observation/state space or shape.
The ``num_observations`` property will contain the size of that space
:type observation_space: int, sequence of int, gym.Space, gymnasium.Space
:type observation_space: int, sequence of int, gymnasium.Space
:param action_space: Action space or shape.
The ``num_actions`` property will contain the size of that space
:type action_space: int, sequence of int, gym.Space, gymnasium.Space
:type action_space: int, sequence of int, gymnasium.Space
:param device: Device on which a torch tensor is or will be allocated (default: ``None``).
If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
:type device: str or torch.device, optional
Expand Down Expand Up @@ -58,7 +58,7 @@ def act(self,
# [start-model-jax]
from typing import Optional, Union, Mapping, Tuple, Any

import gym, gymnasium
import gymnasium

import flax
import jaxlib
Expand All @@ -69,19 +69,19 @@ def act(self,

class CustomModel(Model):
def __init__(self,
observation_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
action_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
observation_space: Union[int, Sequence[int], gymnasium.Space],
action_space: Union[int, Sequence[int], gymnasium.Space],
device: Optional[Union[str, jaxlib.xla_extension.Device]] = None,
parent: Optional[Any] = None,
name: Optional[str] = None) -> None:
"""Custom model
:param observation_space: Observation/state space or shape.
The ``num_observations`` property will contain the size of that space
:type observation_space: int, sequence of int, gym.Space, gymnasium.Space
:type observation_space: int, sequence of int, gymnasium.Space
:param action_space: Action space or shape.
The ``num_actions`` property will contain the size of that space
:type action_space: int, sequence of int, gym.Space, gymnasium.Space
:type action_space: int, sequence of int, gymnasium.Space
:param device: Device on which a jax array is or will be allocated (default: ``None``).
If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
:type device: str or jaxlib.xla_extension.Device, optional
Expand Down
Loading

0 comments on commit 12bcd22

Please sign in to comment.