Sample Efficient Microsatellite Attitude Control using Deep Reinforcement Learning with Unity and OpenAI Gym
The following are simulations with a trained agent using the listed methods below. Training session lasted for 17K episodes (5.1M timesteps) in about 2 full days.
Create a virtualenvironment (or use Pytorch Docker)
virtualenv venv -p python3.6 source venv/bin/activate
Install dependencies
python develop
Create directories needed for training
- tmp: directory of trained models
- unity_environments: directory of unity executable environment
- wandb: for wandb logs when training
mkdir tmp mkdir unity_environments mkdir wandb
Download unity executable from source and extract it on folder
Change folder permission containing the unity executable
Depending on the selected DRL algorithm (e.g. TD3, SAC, SACv2, TD3-PER, etc.), change the hyperparameters and environment config on the YAML file located inside
Train the model
python bin/train/train_<DRL_ALGO>.py
DRL algorithms are composed of the following (included only working ones):
- sac: Soft-Actor Critic V1
- sacv2: Soft-Actor Critic V2
- td3: Twin Delayed DDPG
- td3_per: TD3 with Prioritized Experience Replay
Once the model is trained, change the test config on the YAML config inside
. -
Test the simulation using the command below. It is better if the simulation is a graphical version of the previous Unity Executable
python bin/test/test_<DRL_ALGO>.py
(Optional) You can change the number of episodes inside
Results of the training can be found in this wandb repository
The written research paper will be available soon.