Sample Efficient Microsatellite Attitude Control using Deep Reinforcement Learning with Unity and OpenAI Gym
The following are simulations with a trained agent using the listed methods below. Training session lasted for 17K episodes (5.1M timesteps) in about 2 full days.
-
Create a virtualenvironment (or use Pytorch Docker)
virtualenv venv -p python3.6 source venv/bin/activate
-
Install dependencies
python setup.py develop
-
Create directories needed for training
- tmp: directory of trained models
- unity_environments: directory of unity executable environment
- wandb: for wandb logs when training
mkdir tmp mkdir unity_environments mkdir wandb
-
Download unity executable from source and extract it on folder
unity_environments
-
Change folder permission containing the unity executable
-
Depending on the selected DRL algorithm (e.g. TD3, SAC, SACv2, TD3-PER, etc.), change the hyperparameters and environment config on the YAML file located inside
config/train
-
Train the model
python bin/train/train_<DRL_ALGO>.py
DRL algorithms are composed of the following (included only working ones):
- sac: Soft-Actor Critic V1
- sacv2: Soft-Actor Critic V2
- td3: Twin Delayed DDPG
- td3_per: TD3 with Prioritized Experience Replay
-
Once the model is trained, change the test config on the YAML config inside
config/test
. -
Test the simulation using the command below. It is better if the simulation is a graphical version of the previous Unity Executable
python bin/test/test_<DRL_ALGO>.py
(Optional) You can change the number of episodes inside
bin/test/test_<DRL_ALGO>.py
Results of the training can be found in this wandb repository
https://wandb.ai/jamesandrewsarmiento/microsat_17K/overview?workspace=user-jamesandrewsarmiento
The written research paper will be available soon.