This directory includes a number of working examples of Acme agents. These examples are not meant to be comprehensive and instead show a number of common use cases under which Acme agents can be applied.
Our quickstart guide can be used to get running quickly. This notebook will show how to instantiate a simple agent and run it on an environment. You can also take a look at our tutorial, which takes a more in-depth look at the construction of the D4PG agent. This also highlights the general structure of most Acme agents which applies more broadly to all agents implemented in Acme.
We include a number of agents running on continuous control tasks. These agents are representative examples, but any continuous control algorithm implemented in Acme should be able to be swapped in.
Note that many of the examples, particularly those based on the DeepMind Control Suite, will require a MuJoCo license in order to run. See our tutorial for more details or see refer to the dm_control repository for further information.
- D4PG: a deterministic policy gradient (D4PG) agent which includes a determinstic policy and a distributional critic running on the DeepMind Control Suite.
- D4PG (gym): this example runs the same algorithm on a number of tasks defined in the OpenAI Gym. By default this will run the "mountain car" domain which does not require a MuJoCo license.
- DMPO: a maximum-a-posterior policy optimization agent which combines both a distributional critic and a stochastic policy.
The development of the Arcade Learning environment and the coinciding use of Atari as a benchmark has played a very prominent role in the modern usage and testing of reinforcement learning algorithms. As a result we've also included direct examples of prominent discrete-action algorithms implemented in Acme and running on this environment.
- DQN: a "classic" benchmark agent for Atari; and
Acme includes examples of offline agents, i.e. agents trained using external data generated by another agent:
- BC: a behaviour cloning agent.
- BC (JAX): a behaviour cloning agent (implemented in jax).
- BCQ: an implementation of BCQ.
Similarly we also include so-called "from demonstration" agents which mix offline and online data:
- DQfD: the DQfD agent running on hard-exploration tasks within bsuite (e.g. deep sea) using demonstration data; and
The Behaviour Suite for Reinforcement Learning defines a collection of tasks and environments which collectively investigate core capabilities of RL algorithms across a number of different axes. The examples we include show how to run Acme agents on this suite.
- DQN: an off-policy DQN examples;
- Impala: an on-policy Impala agent; and
- MCTS: a model-based agent running on the task suite using either a simulator of the environment or a learned model.
For more information see https://github.com/deepmind/bsuite.