You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ah, I see.
But if I wanted to apply a ready-made algorithm, I suppose I should make the rewards positive, and just assign the "-ve" reward to the looser.
Sorry if this is a dumb question but here you set different rewards based on whether X wins or O wins:
https://github.com/haje01/gym-tictactoe/blob/master/gym_tictactoe/env.py#L8
Why is that?
If X's reward for winning is -1, wouldn't that encourage an agent that is playing for X to always loose?
Thanks.
The text was updated successfully, but these errors were encountered: