For this Udacity project, I used a single DDPG agent to solve the Tennis multi-agent collaboration environment.
In this environment, two agents control rackets to bounce a ball over a net. If an agent hits the ball over the net, it receives a reward of +0.1. If an agent lets a ball hit the ground or hits the ball out of bounds, it receives a reward of -0.01. Thus, the goal of each agent is to keep the ball in play.
The environment is considered solved, when the average (over 100 episodes) of those scores is at least +0.5.
Codes were written using Python3.6.2, run this to install necessary packages:
pip install -r requirements.txt
The Unity environment Tennis.app
only runs on Macs, for Windows users, download the Windows 32-bit version here.
Please follow the instruction in the Udacity DRLND Instructions on setting up the environment.
- Run all the block cells
Report.ipynb
to initiate the environment and go through the trainings. - Run
evaluate.py
to watch a trained agent play the game with itself. best_actor.pth
andbest_critic.pth
are the weights of the best model's actor and critic networks.network.py
defines the DDPG agent's neural networks.ddpy_agent.py
defines the DDPG agent class.
- Continuous control with deep reinforcement learning
- Udacity Learning Materials