Executing the code
To run the game, execute main.py script as
python main.py --Train 'F'
The position of the array is to be entered as the input where we want to place the 'X'. The format to be entered is 1,2
.
Postion on the board to be entered are as follows:
0,0 | 0,1 | 0,2 |
---|---|---|
1,0 | 1,1 | 1,2 |
2,0 | 2,1 | 2,2 |
Training of the game can be executed by
python main.py --Train 'T'
Implementation
The system learns the game by playing against the other Q learning player. The player who makes the first move is selected at random. There will be two training sessions, one for the bot playing as 'X' and the other for the bot playing as 'O' against the human. So when the Q table is made for the bot playing as 'X', the other player in the training session will have a higher exploring rate, which makes it more probable to make moves at random. This is done so that the other bot learns all the states possible. If both players in the training session have same exploring rate then the bot doesnot learn few moves.
The plots of the number of wins, loses, ties and invalid moves over the number of episodes is given in the below figures. From the graphs we can observe that as the number of episodes increases, loses and ties saturate. On the other hand wins and ties keep increasing proportionally. The plots are in the order of wins, loses, ties and invalid moves.