2. DP methods

History

Name		Name	Last commit message	Last commit date
parent directory ..
One cycle PI.py		One cycle PI.py
README.md		README.md
algorithms.py		algorithms.py
deterministic VI.py		deterministic VI.py

README.md

Dynamic Programming Methods

Summary

In settings where we know how the world/environment works, there are some simpler planning algorithms which we can use like Policy Iteration (PI) and Value Iteration (VI). Essentially, it's about trying to learn the true expected reward of each state and then making decisions using this knowledge of learnt state/action values.

Problem Statement : Frozen Lake environment. The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. The agent is rewarded for finding a walkable path to a goal tile.

Algorithms implemented :

Policy Iteration
Value Iteration

Resources

Chapters 2 and 3 from Sutton and Barto
David Silver's course - lectures 1, 2 and 3
Stanford CS234 - lecture 1 and 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

2. DP methods

2. DP methods

README.md

Dynamic Programming Methods

Summary

Algorithms implemented :

Resources

Files

2. DP methods

Directory actions

More options

Directory actions

More options

Latest commit

History

2. DP methods

Folders and files

parent directory

README.md

Dynamic Programming Methods

Summary

Algorithms implemented :

Resources