Releases: tspooner/rsrl
0.3.0
This includes major changes to the agent interface. The ControlAgent
and PredictionAgent
traits have been unified into a single trait called Agent
with an associated type which distinguishes between control and predictions tasks. The methods associated with choosing actions etc are then defined in Controller
for example.
This makes it much easier to build a library of agents that have some, but not all, guarantees about behaviour. This will also make it easier to move into continuous actions space support and better combinability of the various agents (say, when using a predictor inside an actor critic method).
In addition, this push comes with some new agent implementations based on "true online" gradient methods.
0.2.3
In this release we include some internal naming changes, additional features for the geometry module (in particular geometry::spaces
) and the addition of a new domain that interacts with the OpenAI gym via a python interpreter instance. At the moment there are some unresolved changes needed to better handle infinite dimensions (which are common in the OpenAI gym - see #24), but one can run experiments and upload results to the OpenAI leaderboard.
0.2.2
This version includes an implementation of the Polynomial
basis for linear function approximation, an associated usage example and some refactors to the codebase, including:
- Dedicated type aliases for
Array1
andArray2
:Vector
andMatrix
, respectively; these have a default type argument off64
. - Cleanup of the
cartesian_product
utility function.
0.2.1
Along with some minor additions and changes, this PR includes a fix for the eligibility trace algorithms QLambda
and SARSALambda
which were observed to diverge during testing. As explained in 5919e80, inconsistencies between the application of normalisation in the methods for updating/evaluating the Linear
function approximator lead to instability.
These have been corrected and an example was added to show convergence of the SARSALambda algorithm on the MountainCar
domain.
0.2.0
This breaking update changes the method used to optimise for dense/sparse basis projectors. Specifically, we replace SparseLinear
and DenseLinear
with a single unified struct Linear
which handles both dense and sparse feature vectors.
The new Projection
enum is returned from a Projector
(originally called Projection
itself) which requires explicitly match statements to handle either a dense or sparse vector. A Projector
must now also reveal an expand_projection
method which converts a Projection
into a dense feature vector (of type Array1<f64>
).
All code has been refactored and optimised with respect to these changes and further tests added.