Skip to content

Latest commit

 

History

History
37 lines (25 loc) · 1.83 KB

README.md

File metadata and controls

37 lines (25 loc) · 1.83 KB

Project - MountainCar with DQN

Environment

Solving the environment require an average total reward of over -110 on 100 consecutive episodes.
Training of MountainCar is performed using the Deep Q-Network (DQN) algorithm, see
the basic paper Human-level control through deep reinforcement learning.
We solve the MountainCar environment in 1835 episodes, in 1.75 hours.
By usage of the Q-learning algorithm, the environment is solved in 283600 episodes in 22 minutes!

Training Score

The last few lines from the log

...
Episode: 1720 Score: -104.0 Avg.Score: -115.19, eps-greedy: 0.010 Time: 01:38:00
Episode: 1730 Score: -112.0 Avg.Score: -115.08, eps-greedy: 0.010 Time: 01:38:17
Episode: 1740 Score: -109.0 Avg.Score: -114.50, eps-greedy: 0.010 Time: 01:38:34
Episode: 1750 Score: -104.0 Avg.Score: -112.74, eps-greedy: 0.010 Time: 01:38:50
Episode: 1760 Score: -91.0 Avg.Score: -111.35, eps-greedy: 0.010 Time: 01:39:06
Episode: 1770 Score: -104.0 Avg.Score: -110.76, eps-greedy: 0.010 Time: 01:39:22
Episode: 1780 Score: -107.0 Avg.Score: -112.90, eps-greedy: 0.010 Time: 01:39:44
Episode: 1790 Score: -104.0 Avg.Score: -112.66, eps-greedy: 0.010 Time: 01:40:01
Episode: 1800 Score: -105.0 Avg.Score: -111.67, eps-greedy: 0.010 Time: 01:40:19
Episode: 1810 Score: -164.0 Avg.Score: -111.84, eps-greedy: 0.010 Time: 01:40:35
Episode: 1820 Score: -107.0 Avg.Score: -110.90, eps-greedy: 0.010 Time: 01:40:50
Episode: 1830 Score: -86.0 Avg.Score: -110.42, eps-greedy: 0.010 Time: 01:41:07

Environment solved in 1834 episodes! Average Score: -109.95