Snake learns to survive using reinforcement learning.
Q-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. This is done by means of a look-up table called Q-table which stores a quality value Q for every pair of state S and action a to be taken in that state. Meaning that in a given state S, the best move a is the one associated with the highest Q value.
At the start, the Q-table is initialized with zeros and its entries are updated iteratively during the game using the Bellman equation. To fill up the Q-table quickly, a strategy called ε-greedy is implemented to encourage exploration of game states by choosing random moves with probability ε
or optimal moves using the Q-table values with probability (1-ε)
. At the start of the game, ε
is set to 1.0
(exploration phase) and is slowly decreased throughout the training to 0.0
(exploitation phase).
The Q-learning algorithm, despite the strategies to encourage exploration, falters with increasing numbers of states/actions since the likelihood of the agent visiting a particular state and performing a particular action is increasingly small.
However, since Snake is a really simple game we can engineer a state representation S which only takes up 8-bits
of information, meaning we only have to deal with 256
states!
Here's how:
Bit0
: Apple Y coordinate is greater than Snake's head Y coordinate (Apple is above)Bit1
: Apple Y coordinate is smaller than Snake's head Y coordinate (Apple is below)Bit2
: Apple X coordinate is smaller than Snake's head X coordinate (Apple is on the left)Bit3
: Apple X coordinate is greater than Snake's head X coordinate (Apple is on the right)Bit4
: Snake's body or wall just above Snake's headBit5
: Snake's body or wall just below Snake's headBit6
: Snake's body or wall just left of Snake's headBit7
: Snake's body or wall just right of Snake's head
In this way, we can compress all the redundant information given by the full game matrix in a single byte, also making the state independent of the matrix dimensions. Given that there are 5 actions { IDLE
, UP
, DOWN
, LEFT
, RIGHT
}, the shape of the Q-table will be: (256, 5)
.
Finally, the reward given to the Snake during training is the following:
+10
: If the Snake eats an apple-10
: If the Snake dies-
0
: Otherwise
This value is used in the Bellman equation as the immediate reward R for taking action a in state S.
The Snake typically learns how to play in about 15/20 minutes and, if more time is given, it can even get to grow 90 blocks in length! However, given its short sightedness due to the simplistic state space representation, this is the maximum that he was able to achieve.
snake-rl is written in the C language and uses the raylib simple cross-platform graphics library built on top of OpenGL. You can follow the installation steps of raylib for your operating system at the raylib GitHub repository.
Disclamer: Although raylib is also supported on Windows and macOS, the game has been compiled and tested only on GNU Linux.
You can build snake-rl using make, just use the following:
git clone https://github.com/Gualor/snake-rl.git
cd snake-rl
make
For running the application just use the following:
./build/snake
This will create a new game window and Snake will start learning right away how to play at lightning speed!
Training sessions can take up some time until we can see Snake surviving and getting longer and longer. You can save the Q-table values to file and load it next time you want to resume the training:
./build/snake -s .
This will create a snake.qtable
file which contains the Q-table values in CSV format. If the file already exists it will be overwritten.
For loading the snake.qtable file and resume training, run the following:
./build/snake -l ./snake.qtable -s .
This will load the Q-table and save further progress onto the same file.
NB: ε parameter is not stored inside the snake.qtable file and, therefore, when loading the file ε is automatically set to 0.0
since it is assumed that the exploration phase has already ended.