Merge branch 'develop' into toni/pre-commit

Toni-SM · Nov 4, 2024 · 12bcd22 · 12bcd22
2 parents 6662e70 + eff7295
commit 12bcd22
Show file tree

Hide file tree

Showing 127 changed files with 2,364 additions and 1,793 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yaml b/.github/ISSUE_TEMPLATE/bug_report.yaml
@@ -30,6 +30,7 @@ body:
     description: The skrl version can be obtained with the command `pip show skrl`.
     options:
       - ---
+      - 1.3.0
       - 1.2.0
       - 1.1.0
       - 1.0.0

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,26 @@
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
+## [1.4.0] - Unreleased
+### Added
+- Utilities to operate on Gymnasium spaces (`Box`, `Discrete`, `MultiDiscrete`, `Tuple` and `Dict`)
+- `parse_device` static method in ML framework configuration for JAX
+
+### Changed
+- Call agent's `pre_interaction` method during evaluation
+- Use spaces utilities to process states, observations and actions for all the library components
+- Update model instantiators definitions to process supported fundamental and composite Gymnasium spaces
+- Make flattened tensor storage in memory the default option (revert changed introduced in version 1.3.0)
+- Drop support for PyTorch versions prior to 1.10 (the previous supported version was 1.9).
+
+### Fixed
+- Moved the batch sampling inside gradient step loop for DQN, DDQN, DDPG (RNN), TD3 (RNN), SAC and SAC (RNN)
+
+### Removed
+- Remove OpenAI Gym (`gym`) from dependencies and source code. **skrl** continues to support gym environments,
+  it is just not installed as part of the library. If it is needed, it needs to be installed manually.
+  Any gym-based environment wrapper must use the `convert_gym_space` space utility to operate
+
 ## [1.3.0] - 2024-09-11
 ### Added
 - Distributed multi-GPU and multi-node learning (JAX implementation)
@@ -70,7 +90,7 @@ Summary of the most relevant features:
 ## [1.0.0-rc.2] - 2023-08-11
 ### Added
 - Get truncation from `time_outs` info in Isaac Gym, Isaac Orbit and Omniverse Isaac Gym environments
-- Time-limit (truncation) boostrapping in on-policy actor-critic agents
+- Time-limit (truncation) bootstrapping in on-policy actor-critic agents
 - Model instantiators `initial_log_std` parameter to set the log standard deviation's initial value
 
 ### Changed (breaking changes)
@@ -84,7 +104,7 @@ Summary of the most relevant features:
     - `from skrl.envs.loaders.jax import load_omniverse_isaacgym_env`
 
 ### Changed
-- Drop support for versions prior to PyTorch 1.9 (1.8.0 and 1.8.1)
+- Drop support for PyTorch versions prior to 1.9 (the previous supported version was 1.8)
 
 ## [1.0.0-rc.1] - 2023-07-25
 ### Added
@@ -177,7 +197,7 @@ to allow storing samples in memories during evaluation
 - Parameter `role` to model methods
 - Wrapper compatibility with the new OpenAI Gym environment API
 - Internal library colored logger
-- Migrate checkpoints/models from other RL libraries to skrl models/agents
+- Migrate checkpoints/models from other RL libraries to **skrl** models/agents
 - Configuration parameter `store_separately` to agent configuration dict
 - Save/load agent modules (models, optimizers, preprocessors)
 - Set random seed and configure deterministic behavior for reproducibility

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -54,7 +54,7 @@ Read the code a little bit and you will understand it at first glance... Also
   ```ini
   function annotation (e.g. typing)
   # insert an empty line
-  python libraries and other libraries (e.g. gym, numpy, time, etc.)
+  python libraries and other libraries (e.g. gymnasium, numpy, time, etc.)
   # insert an empty line
   machine learning framework modules (e.g. torch, torch.nn)
   # insert an empty line

diff --git a/docs/source/api/agents/ddqn.rst b/docs/source/api/agents/ddqn.rst
@@ -40,10 +40,10 @@ Learning algorithm
 
 |
 | :literal:`_update(...)`
-| :green:`# sample a batch from memory`
-| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 | :green:`# gradient steps`
 | **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
+|     :green:`# sample a batch from memory`
+|     [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 |     :green:`# compute target values`
 |     :math:`Q' \leftarrow Q_{\phi_{target}}(s')`
 |     :math:`Q_{_{target}} \leftarrow Q'[\underset{a}{\arg\max} \; Q_\phi(s')] \qquad` :gray:`# the only difference with DQN`

diff --git a/docs/source/api/agents/dqn.rst b/docs/source/api/agents/dqn.rst
@@ -40,10 +40,10 @@ Learning algorithm
 
 |
 | :literal:`_update(...)`
-| :green:`# sample a batch from memory`
-| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 | :green:`# gradient steps`
 | **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
+|     :green:`# sample a batch from memory`
+|     [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 |     :green:`# compute target values`
 |     :math:`Q' \leftarrow Q_{\phi_{target}}(s')`
 |     :math:`Q_{_{target}} \leftarrow \underset{a}{\max} \; Q' \qquad` :gray:`# the only difference with DDQN`

diff --git a/docs/source/api/agents/sac.rst b/docs/source/api/agents/sac.rst
@@ -34,10 +34,10 @@ Learning algorithm
 
 |
 | :literal:`_update(...)`
-| :green:`# sample a batch from memory`
-| [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 | :green:`# gradient steps`
 | **FOR** each gradient step up to :guilabel:`gradient_steps` **DO**
+|     :green:`# sample a batch from memory`
+|     [:math:`s, a, r, s', d`] :math:`\leftarrow` states, actions, rewards, next_states, dones of size :guilabel:`batch_size`
 |     :green:`# compute target values`
 |     :math:`a',\; logp' \leftarrow \pi_\theta(s')`
 |     :math:`Q_{1_{target}} \leftarrow Q_{{\phi 1}_{target}}(s', a')`

diff --git a/docs/source/api/config/frameworks.rst b/docs/source/api/config/frameworks.rst
@@ -86,6 +86,8 @@ API
 
     The default device, unless specified, is ``cuda:0`` (or ``cuda:JAX_LOCAL_RANK`` in a distributed environment) if CUDA is available, ``cpu`` otherwise
 
+.. autofunction:: skrl.config.jax.parse_device
+
 .. py:data:: skrl.config.jax.backend
     :type: str
     :value: "numpy"

diff --git a/docs/source/api/utils.rst b/docs/source/api/utils.rst
@@ -6,6 +6,7 @@ Utils and configurations
 
     ML frameworks configuration <config/frameworks>
     Random seed <utils/seed>
+    Spaces <utils/spaces>
     Model instantiators <utils/model_instantiators>
     Runner <utils/runner>
     Distributed runs <utils/distributed>
@@ -39,6 +40,9 @@ A set of utilities and configurations for managing an RL setup is provided as pa
     * - :doc:`Random seed <utils/seed>`
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`
+    * - :doc:`Spaces <utils/spaces>`
+      - .. centered:: :math:`\blacksquare`
+      - .. centered:: :math:`\blacksquare`
     * - :doc:`Model instantiators <utils/model_instantiators>`
       - .. centered:: :math:`\blacksquare`
       - .. centered:: :math:`\blacksquare`

diff --git a/docs/source/api/utils/spaces.rst b/docs/source/api/utils/spaces.rst
@@ -0,0 +1,86 @@
+Spaces
+======
+
+Utilities to operate on Gymnasium `spaces <https://gymnasium.farama.org/api/spaces>`_.
+
+.. raw:: html
+
+    <br><hr>
+
+Overview
+--------
+
+The utilities described in this section supports the following Gymnasium spaces:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Type
+      - Supported spaces
+    * - Fundamental
+      - :py:class:`~gymnasium.spaces.Box`, :py:class:`~gymnasium.spaces.Discrete`, and :py:class:`~gymnasium.spaces.MultiDiscrete`
+    * - Composite
+      - :py:class:`~gymnasium.spaces.Dict` and :py:class:`~gymnasium.spaces.Tuple`
+
+The following table provides a snapshot of the space sample conversion functions:
+
+.. list-table::
+    :header-rows: 1
+
+    * - Input
+      - Function
+      - Output
+    * - Space (NumPy / int)
+      - :py:func:`~skrl.utils.spaces.torch.tensorize_space`
+      - Space (PyTorch / JAX)
+    * - Space (PyTorch / JAX)
+      - :py:func:`~skrl.utils.spaces.torch.untensorize_space`
+      - Space (NumPy / int)
+    * - Space (PyTorch / JAX)
+      - :py:func:`~skrl.utils.spaces.torch.flatten_tensorized_space`
+      - PyTorch tensor / JAX array
+    * - PyTorch tensor / JAX array
+      - :py:func:`~skrl.utils.spaces.torch.unflatten_tensorized_space`
+      - Space (PyTorch / JAX)
+
+.. raw:: html
+
+    <br>
+
+API (PyTorch)
+-------------
+
+.. autofunction:: skrl.utils.spaces.torch.compute_space_size
+
+.. autofunction:: skrl.utils.spaces.torch.convert_gym_space
+
+.. autofunction:: skrl.utils.spaces.torch.flatten_tensorized_space
+
+.. autofunction:: skrl.utils.spaces.torch.sample_space
+
+.. autofunction:: skrl.utils.spaces.torch.tensorize_space
+
+.. autofunction:: skrl.utils.spaces.torch.unflatten_tensorized_space
+
+.. autofunction:: skrl.utils.spaces.torch.untensorize_space
+
+.. raw:: html
+
+    <br>
+
+API (JAX)
+---------
+
+.. autofunction:: skrl.utils.spaces.jax.compute_space_size
+
+.. autofunction:: skrl.utils.spaces.jax.convert_gym_space
+
+.. autofunction:: skrl.utils.spaces.jax.flatten_tensorized_space
+
+.. autofunction:: skrl.utils.spaces.jax.sample_space
+
+.. autofunction:: skrl.utils.spaces.jax.tensorize_space
+
+.. autofunction:: skrl.utils.spaces.jax.unflatten_tensorized_space
+
+.. autofunction:: skrl.utils.spaces.jax.untensorize_space
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -197,6 +197,7 @@ Utils and configurations
 
     * :doc:`ML frameworks <api/config/frameworks>` configuration
     * :doc:`Random seed <api/utils/seed>`
+    * :doc:`Spaces <api/utils/spaces>`
     * :doc:`Model instantiators <api/utils/model_instantiators>`
     * :doc:`Runner <api/utils/runner>`
     * :doc:`Distributed runs <api/utils/distributed>`

diff --git a/docs/source/intro/installation.rst b/docs/source/intro/installation.rst
@@ -12,10 +12,10 @@ In this section, you will find the steps to install the library, troubleshoot kn
 
 **skrl** requires Python 3.6 or higher and the following libraries (they will be installed automatically):
 
-    * `gym <https://www.gymlibrary.dev>`_ / `gymnasium <https://gymnasium.farama.org/>`_
-    * `tqdm <https://tqdm.github.io>`_
+    * `gymnasium <https://gymnasium.farama.org/>`_
     * `packaging <https://packaging.pypa.io>`_
     * `tensorboard <https://www.tensorflow.org/tensorboard>`_
+    * `tqdm <https://tqdm.github.io>`_
 
 Machine learning (ML) framework
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -25,7 +25,7 @@ According to the specific ML frameworks, the following libraries are required:
 PyTorch
 """""""
 
-    * `torch <https://pytorch.org>`_ 1.9.0 or higher
+    * `torch <https://pytorch.org>`_ 1.10.0 or higher
 
 JAX
 """

diff --git a/docs/source/snippets/agent.py b/docs/source/snippets/agent.py
@@ -1,7 +1,7 @@
 # [start-agent-base-class-torch]
 from typing import Union, Tuple, Dict, Any, Optional
 
-import gym, gymnasium
+import gymnasium
 import copy
 
 import torch
@@ -33,8 +33,8 @@ class CUSTOM(Agent):
     def __init__(self,
                  models: Dict[str, Model],
                  memory: Optional[Union[Memory, Tuple[Memory]]] = None,
-                 observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
-                 action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
+                 observation_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
+                 action_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
                  device: Optional[Union[str, torch.device]] = None,
                  cfg: Optional[dict] = None) -> None:
         """Custom agent
@@ -46,9 +46,9 @@ def __init__(self,
                        for the rest only the environment transitions will be added
         :type memory: skrl.memory.torch.Memory, list of skrl.memory.torch.Memory or None
         :param observation_space: Observation/state space or shape (default: None)
-        :type observation_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
+        :type observation_space: int, tuple or list of integers, gymnasium.Space or None, optional
         :param action_space: Action space or shape (default: None)
-        :type action_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
+        :type action_space: int, tuple or list of integers, gymnasium.Space or None, optional
         :param device: Device on which a torch tensor is or will be allocated (default: ``None``).
                        If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
         :type device: str or torch.device, optional
@@ -179,7 +179,7 @@ def _update(self, timestep: int, timesteps: int) -> None:
 # [start-agent-base-class-jax]
 from typing import Union, Tuple, Dict, Any, Optional
 
-import gym, gymnasium
+import gymnasium
 import copy
 
 import jaxlib
@@ -213,8 +213,8 @@ class CUSTOM(Agent):
     def __init__(self,
                  models: Dict[str, Model],
                  memory: Optional[Union[Memory, Tuple[Memory]]] = None,
-                 observation_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
-                 action_space: Optional[Union[int, Tuple[int], gym.Space, gymnasium.Space]] = None,
+                 observation_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
+                 action_space: Optional[Union[int, Tuple[int], gymnasium.Space]] = None,
                  device: Optional[Union[str, jaxlib.xla_extension.Device]] = None,
                  cfg: Optional[dict] = None) -> None:
         """Custom agent
@@ -226,9 +226,9 @@ def __init__(self,
                        for the rest only the environment transitions will be added
         :type memory: skrl.memory.jax.Memory, list of skrl.memory.jax.Memory or None
         :param observation_space: Observation/state space or shape (default: None)
-        :type observation_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
+        :type observation_space: int, tuple or list of integers, gymnasium.Space or None, optional
         :param action_space: Action space or shape (default: None)
-        :type action_space: int, tuple or list of integers, gym.Space, gymnasium.Space or None, optional
+        :type action_space: int, tuple or list of integers, gymnasium.Space or None, optional
         :param device: Device on which a jax array is or will be allocated (default: ``None``).
                        If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
         :type device: str or jaxlib.xla_extension.Device, optional

diff --git a/docs/source/snippets/model_mixin.py b/docs/source/snippets/model_mixin.py
@@ -1,7 +1,7 @@
 # [start-model-torch]
 from typing import Optional, Union, Mapping, Sequence, Tuple, Any
 
-import gym, gymnasium
+import gymnasium
 
 import torch
 
@@ -10,17 +10,17 @@
 
 class CustomModel(Model):
     def __init__(self,
-                 observation_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
-                 action_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
+                 observation_space: Union[int, Sequence[int], gymnasium.Space],
+                 action_space: Union[int, Sequence[int], gymnasium.Space],
                  device: Optional[Union[str, torch.device]] = None) -> None:
         """Custom model
 
         :param observation_space: Observation/state space or shape.
                                   The ``num_observations`` property will contain the size of that space
-        :type observation_space: int, sequence of int, gym.Space, gymnasium.Space
+        :type observation_space: int, sequence of int, gymnasium.Space
         :param action_space: Action space or shape.
                              The ``num_actions`` property will contain the size of that space
-        :type action_space: int, sequence of int, gym.Space, gymnasium.Space
+        :type action_space: int, sequence of int, gymnasium.Space
         :param device: Device on which a torch tensor is or will be allocated (default: ``None``).
                        If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
         :type device: str or torch.device, optional
@@ -58,7 +58,7 @@ def act(self,
 # [start-model-jax]
 from typing import Optional, Union, Mapping, Tuple, Any
 
-import gym, gymnasium
+import gymnasium
 
 import flax
 import jaxlib
@@ -69,19 +69,19 @@ def act(self,
 
 class CustomModel(Model):
     def __init__(self,
-                 observation_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
-                 action_space: Union[int, Sequence[int], gym.Space, gymnasium.Space],
+                 observation_space: Union[int, Sequence[int], gymnasium.Space],
+                 action_space: Union[int, Sequence[int], gymnasium.Space],
                  device: Optional[Union[str, jaxlib.xla_extension.Device]] = None,
                  parent: Optional[Any] = None,
                  name: Optional[str] = None) -> None:
         """Custom model
 
         :param observation_space: Observation/state space or shape.
                                   The ``num_observations`` property will contain the size of that space
-        :type observation_space: int, sequence of int, gym.Space, gymnasium.Space
+        :type observation_space: int, sequence of int, gymnasium.Space
         :param action_space: Action space or shape.
                              The ``num_actions`` property will contain the size of that space
-        :type action_space: int, sequence of int, gym.Space, gymnasium.Space
+        :type action_space: int, sequence of int, gymnasium.Space
         :param device: Device on which a jax array is or will be allocated (default: ``None``).
                        If None, the device will be either ``"cuda:0"`` if available or ``"cpu"``
         :type device: str or jaxlib.xla_extension.Device, optional