Skip to content

Latest commit

 

History

History
34 lines (26 loc) · 1.65 KB

README.md

File metadata and controls

34 lines (26 loc) · 1.65 KB

Diagram

A distributed system for Reinforcement Learning / Training

Support for multiple environments which produces a Trajectory into a replaybuffer.
The base model is a DQN.
Environments and Policy are stored in object storage , while the replaybuffer is on bigtable.
While DQN is an on-policy algorithm, in a distributed environment the collection policy is sometimes stale compared to the training policy. In this case, we are training off policy.

The code provided requires certain configuration / resources in order to run:
We used Google Cloud Platform, but it may be possible to use other services
-A cloud service with:
-object storage (GCP glob)
-query based database (bigtable)
-Docker orchestration (Kubernetes)
-TPU / GPU allocation
-Service Authentication
-An Environment which outputs (State Observations, Available Actions, previous_state_Reward)
-We have included 2 open source environments /cartpole and /breakout to test.
-/crane contains the ML training code but lacks the environment because it is currently closed source.

Multi-Environment Deployment :

With Docker Orchestration Deploy :
https://github.com/jun-bun/rab-tf-agents/blob/master/deploy/Dockerfile
Note we use a specific docker build which supports Unity on Linux. The base container is hosted here : https://hub.docker.com/r/tenserflow/gpu-unity-ubuntu-xfce-novnc

Single Environment Test :

python3 -m breakout.collect_to_bigtable

Training:

python3 -m breakout.train_from_bigtable

Demo