Soft transition from Average to Max backup operator #940

AlexisOlson · 2019-08-29T01:46:35Z

For node a with chilren b_1,b_2,...,b_n with evals q_1,q_2,...q_n,
let b_max be the best child with eval q_max.

Instead of averaging the q_i to update the parent use an Lp norm average where
LpNormAvg({x_i}) = ( 1/n * Sum_i (x_i)^p )^(1/p)

For p = 1, this is the identical to the mean and as p -> infinity, LpNormAvg({x_i}) -> Max({x_i}).

Since Q is in [-1,1], we want to shift to a non-negative interval before taking an Lp average. Let's also rescale by q_max so that all shifted and scaled q_i are in [0,1] and values don't blow up for large p.

Define t_i = (q_i + 1) / (q_max + 1) where t_i is in [0,1] as explained above.
Then the final average value to use for the parent node would be
LpNormAvg({t_i}) * (q_max + 1) - 1
to rescale and shift back to the [-1,1] range.

The p to use for the Lp norm would be an increasing function of visits to node a. Simply p = visits is likely too fast of a transition.

The text was updated successfully, but these errors were encountered:

AlexisOlson · 2019-08-29T01:49:17Z

This is the part of the code where the average happens.

lc0/src/mcts/node.cc

Line 267 in db42c60

void Node::FinalizeScoreUpdate(float v, float d, int multivisit) {

oscardssmith · 2019-08-29T01:53:04Z

1+n/100 is about right based on some playing around with numbers.

AlexisOlson · 2019-08-29T02:53:31Z

On Discord, @fersbery (@DanielUranga) pointed out that this idea is very similar to this previous PR (though the transition implementation is not): #243

AlexisOlson · 2019-09-08T05:00:39Z

@Naphthalin Do you happen to know if there is a cheap way to compute or approximate Lp norms?

oscardssmith · 2019-09-08T15:18:46Z

if we only use powers of 2 for p, which shouldn't cause problems, it's not too expensive, only 1 pow call.

Naphthalin · 2020-05-01T10:28:19Z

@AlexisOlson did you ever follow up on that idea? It seems to be very close to the Power-UCT of https://arxiv.org/pdf/1911.00384.pdf, which accelerates the convergence to "newly discovered evals" a little bit, but adds complication.

Also, a continuous recalculation of Q values is one of the major slowdowns in the current implementation of #963 where it is side-stepped by doing the full recalculation only every N visits to a node. Definitely an interesting suggestion, as the "correct" calculation of evals is one of the 2 big shortcomings of the PUCT algorithm, the other being the long-term dependency on the policies discussed in #1231

AlexisOlson · 2020-05-01T14:25:10Z

@Naphthalin When I was thinking about this further, I realized I was missing half of the min-max idea and just transitioning to max with what I was describing. I'll have to read the paper you linked for a more proper idea.

Naphthalin · 2020-05-01T14:49:57Z

IMO don't waste your time with that paper, as the idea is pretty unsound and still doesn't address our core problem(s). I just wanted to document that at least there is a paper carrying out a similar idea :)

Naphthalin · 2022-11-02T11:50:28Z

Just linking this issue to #1734 so we find it again. Once we have the Q recalculation loop as a separate class/template, things like his could easily be tried.

AlexisOlson mentioned this issue Sep 19, 2020

P norm backup experiment (WIP) #1426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soft transition from Average to Max backup operator #940

Soft transition from Average to Max backup operator #940

AlexisOlson commented Aug 29, 2019

AlexisOlson commented Aug 29, 2019

oscardssmith commented Aug 29, 2019

AlexisOlson commented Aug 29, 2019 •

edited

Loading

AlexisOlson commented Sep 8, 2019

oscardssmith commented Sep 8, 2019

Naphthalin commented May 1, 2020

AlexisOlson commented May 1, 2020

Naphthalin commented May 1, 2020

Naphthalin commented Nov 2, 2022

Soft transition from Average to Max backup operator #940

Soft transition from Average to Max backup operator #940

Comments

AlexisOlson commented Aug 29, 2019

AlexisOlson commented Aug 29, 2019

oscardssmith commented Aug 29, 2019

AlexisOlson commented Aug 29, 2019 • edited Loading

AlexisOlson commented Sep 8, 2019

oscardssmith commented Sep 8, 2019

Naphthalin commented May 1, 2020

AlexisOlson commented May 1, 2020

Naphthalin commented May 1, 2020

Naphthalin commented Nov 2, 2022

AlexisOlson commented Aug 29, 2019 •

edited

Loading