cartpolebalancing

Using Policy Iteration to solve the Cart Pole Balancing problem. A simple 1 hidden layer fully connected neural network is used to evaluate the best action for a given state. Suppose a training episode lasts for k steps. Reward for each step is collected, and discounted return is calculated for each step after the episode ends. (state,discounted return) is stored for each each episode. Backpropogration is done for a batch of episodes, and the process is repeated for a number of batches.

Here's a GIF of the trained AI:

Simulation environment: OpenAI Gym Cartpole-v0

Forward pass and backpropogation done in Theano. Here are good tutorial for getting started with Theano and for implementing a simple ANN.

I used the CPU for this. The Nvidia drivers are a bit tricky to install on Ubuntu 1604 if you have Intel's Skylake. Here's my Theano .theanorc config for CPU:

[global]
floatX = float32
device = cpu
force_device=True
pycuda.init = False

[lib]
cnmem = 1

[blas]
ldflags=-L/usr/lib/ -lblas

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
adam.py		adam.py
cartpole.py		cartpole.py
screen.gif		screen.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cartpolebalancing

About

Releases

Packages

Languages

litesaber15/cartpolebalancing

Folders and files

Latest commit

History

Repository files navigation

cartpolebalancing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages