Different rewards for O and X? #5

drozzy · 2020-03-06T22:02:52Z

Sorry if this is a dumb question but here you set different rewards based on whether X wins or O wins:
https://github.com/haje01/gym-tictactoe/blob/master/gym_tictactoe/env.py#L8

Why is that?

If X's reward for winning is -1, wouldn't that encourage an agent that is playing for X to always loose?

Thanks.

haje01 · 2020-03-07T01:31:11Z

Please consider the code below:

gym-tictactoe/examples/td_agent.py

Lines 111 to 115 in 84e22fc

    
           # select most right action for 'O' or 'X' 
        
           if self.mark == 'O': 
        
               indices = best_val_indices(ava_values, max) 
        
           else: 
        
               indices = best_val_indices(ava_values, min)

drozzy · 2020-03-07T02:32:26Z

Ah, I see.
But if I wanted to apply a ready-made algorithm, I suppose I should make the rewards positive, and just assign the "-ve" reward to the looser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different rewards for O and X? #5

Different rewards for O and X? #5

drozzy commented Mar 6, 2020

haje01 commented Mar 7, 2020 •

edited

Loading

drozzy commented Mar 7, 2020

Different rewards for O and X? #5

Different rewards for O and X? #5

Comments

drozzy commented Mar 6, 2020

haje01 commented Mar 7, 2020 • edited Loading

drozzy commented Mar 7, 2020

haje01 commented Mar 7, 2020 •

edited

Loading