Skip to content
Videodr0me edited this page Jun 11, 2018 · 55 revisions

Single-legal-move extension (--auto-extend=1)

Nodes (or chains of nodes) that have only one-legal-move are game theoretically completely linked. For example: Node A -> only one legal move -> Node B -> only one legal move -> Node C -> Many legal moves. Here A, B and C while different nodes are actually one game state, as from A you will inevitably land in C. They should have identical evaluations. Currently MCTS first vists A, propagates the score, then maybe visits B propagates the score and only on the third visit reaches C - while actually it could have directly visited C and backed up that score to B and A. Not only would this save visits, but it is also probable that a deeper NN eval is more accurate then a shallow one (Beware: this is not always the case!). I called this auto-extending, because if only one legal move, we really want to first eval C and then just back up the score.

In chess these situations most often occur after a check, where the opponents King has only one move. In other games like Othello or Checkers there are even more prevalent. One example Position with Leelas normal MCTS:

position fen 5k2/6pp/p1qN4/1p1p4/3P4/2PKP2Q/PP3r2/3R4 b - - 0 1
go nodes 30000
.
.
info depth 2 seldepth 28 time 5179 nodes 19476 score cp -3 hashfull 40 nps 3760 pv c6d6 h3c8 f8f7 c8b7 f7g8 d1g1 f2f7 b7a8 f7f8 a8b7 f8f7 b7a8 f7f8 a8b7 f8f7
info depth 2 seldepth 28 time 5192 nodes 20254 score cp 9003 hashfull 40 nps 3901 pv c6c4 d6c4 b5c4

It takes normal MCTS 20254 nodes to find the correct move Qc4.

With auto-extending (--auto-extend=1) it takes only 10340 nodes and time went down from 5192 to 2923ms:

info depth 2 seldepth 27 time 2919 nodes 10215 score cp 1 hashfull 23 nps 3499 pv c6d6 h3c8 f8f7 c8b7 f7g8 d1g1 f2f7 b7a8 f7f8 a8b7 f8f7 b7a8 f7f8 a8b7 f8f7
info depth 2 seldepth 27 time 2923 nodes 10340 score cp 7304 hashfull 23 nps 3537 pv c6c4 d6c4 b5c4

Well, one example does of not prove much, so i played a 10.000 game self-play match. I did not expect this change to be measurable, because even in chess such only-move cases are rare and even if they occur it was not clear if they really matter. The result was within those expectations - and probably 100.000 games are necessary to finally get out of the noise levels.

lc0-cudnn selfplay --parallelism=8 --backend=multiplexing "--backend-opts=cudnn(threads=2)" --games=10000 --visits=100 --temperature=1 --tempdecay-moves=10 player1: --certainty-prop=0 --auto-extend=1 --backpropagate-mode=0 --optimal-select=0 player2: --auto-extend=0 --certainty-prop=0 --optimal-select=0 --backpropagate-mode=0

tournamentstatus final P1: +2974 -2925 =4101 Win: 50.24% Elo:  1.70 LOS: 73.82% P1-W: +1705 -1267 =2028 P1-B: +1269 -1658 =2073

Certainty propagation (--certainty-prop=1)

Certainty propagation is known in literature as MCTS-Solver or Proof-Number-Search (Winands et. al). If we reach terminal (which are "certain") nodes, we can propagate the "certain" results from these nodes efficiently up the search-tree. Standard leela MCTS treats these "certain" results just like normal evaluations by the NN and does not use this information to prune branches. For example if one side can play a move thats leads to a terminal win node, we can make the parent node a "certain" win. Further, nodes whose children are all certain, become certain themselves with the max q among the certain childs being propagated up the tree. With this even MCTS can solve some shallow mates, and steer exploring the tree more efficiently and avoid branches that are "proven" to no longer need exploration. During move selection i always select decisive (winning#) moves and always (try) to avoid anti-decisive (loosing) moves.

A nice property is that if we have a certain win at root we can play it immediatly, regardless of the visits that move received.

One example position where current leela needs 446439 nodes to find the key move Qf8+:

position fen 7k/pp4np/2p3p1/3pN1q1/3P4/Q7/1r3rPP/2R2RK1 w - - 0 1
go nodes 500000
.
.
info depth 2 seldepth 22 time 11827 nodes 409237 score cp 0 certain 0 hashfull 14 nps 34601 pv f1f2 g5c1 f2f1 b2g2 g1g2 c1a3 e5f7 h8g8 f7h6 g8h8 h6f7 h8g8 f7h6 g8h8 h6f7
info depth 2 seldepth 22 time 12265 nodes 445439 score cp 11730 certain 0 hashfull 14 nps 36317 pv a3f8 f2f8 f1f8

With certainty propagation Leela finds Qf8 in 223093 nodes:

info depth 2 seldepth 22 time 7526 nodes 198820 score cp 0 certain 76 hashfull 14 nps 26417 pv f1f2 g5c1 f2f1 b2g2 g1g2 c1a3 e5f7 h8g8 f7h6 g8h8 h6f7 h8g8 f7h6 g8h8 h6f7
info depth 2 seldepth 22 time 8300 nodes 223093 score cp 12800 certain 88 hashfull 14 nps 26878 pv a3f8 f2f8 f1f8

Again the general effect is probably small and hard to measure as proof-search needs high amount of visited nodes to be really effective. It should scale well with more nodes searched, but in order to generate 10000 games I had to limit search to 100 visits.

lc0-cudnn selfplay --parallelism=8 --backend=multiplexing "--backend-opts=cudnn(threads=2)" --games=10000 --visits=100 --temperature=1 --tempdecay-moves=10 player1: --certainty-prop=1 --auto-extend=0 --backpropagate-mode=0 --optimal-select=0 player2: --auto-extend=0 --certainty-prop=0 --optimal-select=0 --backpropagate-mode=0
tournamentstatus final P1: +2937 -2849 =4214 Win: 50.44% Elo:  3.06 LOS: 87.63% P1-W: +1718 -1140 =2142 P1-B: +1219 -1709 =2072

Update The certainty propagation was updated to. (1) test for terminal nodes immediatly when creating child nodes (2) only check for siblings if the propagated v is certain and (3) and offer an additonal mode for expansion selection.

--certainty-prop=1 doesn't avoids selecting anti-decisive moves and is slightly stronger --certainty-prop=2 avoids selecting anti-decisive moves

Match between --certainty-prop=1 and certainty--prop=2:

tournamentstatus final P1: +297 -279 =424 Win: 50.90% Elo:  6.25 LOS: 77.34% P1-W: +167 -130 =203 P1-B: +130 -149 =221

The updated certainty propagation is more effective than the old one. A very dramatic example is this position, where standard leela does not find the combination within 1 million visits:

position fen r1bq2rk/pp3pbp/2p1p1pQ/7P/3P4/2PB1N2/PP3PPR/2KR4 w - - 0 1
go nodes 1000000
.
info depth 2 seldepth 49 time 768914 nodes 1000000 score cp 612 certain 0 hashfull 1000 nps 1300 pv h6e3 g6g5 f3g5 d8c7 h5h6 g7f6 h2h5 a7a5 e3f3 c7e7 g5h7 f6h4 f3f4 f7f5 f4h4 e7h4 h5h4 h8h7 g2g4 g8g6 d1g1 c8d7 g4g5 c6c5 d4c5 d7c6 f2f4 a8d8 h4h3 d8d5 g1e1 d5c5 h3e3 c6d7

This is --certainty-prop=1 --auto-extend=1 on the same position:

info depth 1 seldepth 18 time 1068 nodes 1214 score cp 12800 certain 34 hashfull 8 nps 1136 pv h6h7 h8h7 h5g6

The new scheme should gain a couple of elo in self-play and maybe more elo against other engines over the old scheme. Updated executable and tests will follow soon.

Moving-Average-Q (--backpropagate-gamma=0.75)

This is a variation on MA (Gudmundsson and Björnsson 2011) and partly of TO-MAST strategies (see Finnsson and Bjornsson 2008). While TO-MAST mainly uses MA for policy choices, Gudmundsson and Björnsson apply them to the propagation of Q. MCTS in its Leela flavour often takes very long to revise initial optimistic (or pessimistic) evaluations that draw visits to suboptimal parts of the tree. The main idea is to make lc0 update of q more adaptive and have less "memory". I tried:

A decay-power function

q_ += (v - q_) / (std::powf(n_, gamma) + 1);
n_++;

Gamma=0.75 yielded these results at 10000 games with 100 visits per move:

tournamentstatus final P1: +3138 -2431 =4431 Win: 53.54% Elo: 24.60 LOS: 100.00% P1-W: +1805 -1143 =2052 P1-B: +1333 -1288 =2379

but these results at 1000 games with 800 visits per move:

tournamentstatus final P1: +180 -193 =627 Win: 49.35% Elo: -4.52 LOS: 25.04% P1-W: +95 -102 =303 P1-B: +85 -91 =324

results for Gamma 0.955 at 800 visits per move are even worse:

tournamentstatus final P1: +182 -218 =600 Win: 48.20% Elo: -12.51 LOS:  3.59% P1-W: +104 -111 =285 P1-B: +78 -107 =315

Strange that this does so well at 100 visits - maybe its just an initial bias that make it work with fewer node, and if more nodes are searched thast bias is insignificant? I am now moving to a linear formula. Linear decay function

q_ += (v - q_) / ((float)n_ * Beta) + 1);
n_++;

results for Beta=0.49 at 100 visits per move in 10000 games:

tournamentstatus final P1: +3775 -2486 =3739 Win: 56.45% Elo: 45.04 LOS: 100.00% P1-W: +1954 -1046 =2000 P1-B: +1821 -1440 =1739

results for Beta=0.49 at 800 visits per move in 1000 games:

tournamentstatus final P1: +210 -198 =592 Win: 50.60% Elo:  4.17 LOS: 72.38% P1-W: +122 -98 =280 P1-B: +88 -100 =312

10000 visits, 500 games:

tournamentstatus final P1: +39 -55 =406 Win: 48.40% Elo: -11.12 LOS:  4.94% P1-W: +31 -15 =203 P1-B: +8 -40 =203

10.000 games are needed to get out of the noise. More results pending....

Optimal-Selection (--optimal-select=1)

This is not yet final - and just a playground for various selection strategies, currently it does well at tactics (170/200 WAC Silvertestsuite) but suffers somewhat in selfplay, even though results against different opponents (non leela) are better. Needs more work. Might be useful for long analysis as it restores MCTS convergence properties (under some circumstances leela would never find moves no matter how many nodes visited.) Not recommended at the moment.

Note: Test parameters were old default cpuct=1.2, fpu-reduction=0.0, and NN 5d46d9c438a6901e7cd8ebfde15ec00117119cabfcd528d4ce5f568243ded5ee

For test positions threads=1 and batchsize=1 were used for reproducability.

Clone this wiki locally