-
Notifications
You must be signed in to change notification settings - Fork 0
MCTS Solver Certainty Propagation and Autoextending
Certainty propagation is known in literature as MCTS-Solver or Proof-Number-Search (Winands et. al). If we reach terminal (which are "certain") nodes, we can propagate the "certain" results from these nodes efficiently up the search-tree. Standard leela MCTS treats these "certain" results just like normal evaluations by the NN and does not use this information to prune branches. For example if one side can play a move thats leads to a terminal win node, we can make the parent node a "certain" node scored as a loss from the opponents view. Further, nodes whose children are all certain, become certain themselves with the -max q among the certain childs. With this even MCTS can solve some shallow mates, and steer exploring the tree more efficiently and avoid branches that are "proven" to no longer need exploration.
A nice property is that if we have a certain win at root we can play it immediatly, regardless of the visits that move received.
One example position where current leela needs 446439 nodes to find the key move Qf8+:
position fen 7k/pp4np/2p3p1/3pN1q1/3P4/Q7/1r3rPP/2R2RK1 w - - 0 1
go nodes 500000
.
.
info depth 2 seldepth 22 time 11827 nodes 409237 score cp 0 certain 0 hashfull 14 nps 34601 pv f1f2 g5c1 f2f1 b2g2 g1g2 c1a3 e5f7 h8g8 f7h6 g8h8 h6f7 h8g8 f7h6 g8h8 h6f7
info depth 2 seldepth 22 time 12265 nodes 445439 score cp 11730 certain 0 hashfull 14 nps 36317 pv a3f8 f2f8 f1f8
With certainty propagation Leela finds Qf8 in 223093 nodes:
info depth 2 seldepth 22 time 7526 nodes 198820 score cp 0 certain 76 hashfull 14 nps 26417 pv f1f2 g5c1 f2f1 b2g2 g1g2 c1a3 e5f7 h8g8 f7h6 g8h8 h6f7 h8g8 f7h6 g8h8 h6f7
info depth 2 seldepth 22 time 8300 nodes 223093 score cp 12800 certain 88 hashfull 14 nps 26878 pv a3f8 f2f8 f1f8
Update
The certainty propagation was updated to: (1) test for terminal nodes immediatly when creating child nodes (2) only check for siblings if the propagated v is certain and (3) and offer an additonal mode for expansion selection.
--certainty-prop=1 is currently the strongest option and treats certain nodes just like terminal nodes
--certainty-prop=2 avoids selecting certain nodes that loose
--certainty-prop=3 experimental (sets upper confidence bound of draws to zero and halfs the confidence drecrease per visit for certain nodes)
Avoiding to visit certain moves that loose might lead to overestimation of the current path, while not adjusting the UCB for certainty can lead to underestimation. This trade-off is also present for visting siblings of certain winning nodes, although to a much lesser degree. For more details see Winands et. al..
The updated certainty propagation is more effective than the old one. A very dramatic example is this position, where standard leela does not find the combination within 1 million visits:
position fen r1bq2rk/pp3pbp/2p1p1pQ/7P/3P4/2PB1N2/PP3PPR/2KR4 w - - 0 1
go nodes 1000000
.
info depth 2 seldepth 49 time 768914 nodes 1000000 score cp 612 certain 0 hashfull 1000 nps 1300 pv h6e3 g6g5 f3g5 d8c7 h5h6 g7f6 h2h5 a7a5 e3f3 c7e7 g5h7 f6h4 f3f4 f7f5 f4h4 e7h4 h5h4 h8h7 g2g4 g8g6 d1g1 c8d7 g4g5 c6c5 d4c5 d7c6 f2f4 a8d8 h4h3 d8d5 g1e1 d5c5 h3e3 c6d7
This is --certainty-prop=1 --auto-extend=1 on the same position:
info depth 1 seldepth 18 time 1068 nodes 1214 score cp 12800 certain 34 hashfull 8 nps 1136 pv h6h7 h8h7 h5g6
The new scheme should gain a couple of elo in self-play and maybe more elo against other engines over the old scheme (see match results below)
Nodes (or chains of nodes) that have only one-legal-move are game theoretically completely linked. For example: Node A -> only one legal move -> Node B -> only one legal move -> Node C -> Many legal moves. Here A, B and C while different nodes are actually one game state, as from A you will inevitably land in C. They should have identical evaluations. Currently MCTS first vists A, propagates the score, then maybe visits B propagates the score and only on the third visit reaches C - while actually it could have directly visited C and backed up that score to B and A. Not only would this save visits, but it is also probable that a deeper NN eval is more accurate then a shallow one (Beware: this is not always the case!). I called this auto-extending, because if only one legal move, we really want to first eval C and then just back up the score.
In chess these situations most often occur after a check, where the opponents King has only one move. In other games like Othello or Checkers there are even more prevalent. One example Position with Leelas normal MCTS:
position fen 5k2/6pp/p1qN4/1p1p4/3P4/2PKP2Q/PP3r2/3R4 b - - 0 1
go nodes 30000
.
.
info depth 2 seldepth 28 time 5179 nodes 19476 score cp -3 hashfull 40 nps 3760 pv c6d6 h3c8 f8f7 c8b7 f7g8 d1g1 f2f7 b7a8 f7f8 a8b7 f8f7 b7a8 f7f8 a8b7 f8f7
info depth 2 seldepth 28 time 5192 nodes 20254 score cp 9003 hashfull 40 nps 3901 pv c6c4 d6c4 b5c4
It takes normal MCTS 20254 nodes to find the correct move Qc4.
With auto-extending (--auto-extend=1) it takes only 10340 nodes and time went down from 5192 to 2923ms:
info depth 2 seldepth 27 time 2919 nodes 10215 score cp 1 hashfull 23 nps 3499 pv c6d6 h3c8 f8f7 c8b7 f7g8 d1g1 f2f7 b7a8 f7f8 a8b7 f8f7 b7a8 f7f8 a8b7 f8f7
info depth 2 seldepth 27 time 2923 nodes 10340 score cp 7304 hashfull 23 nps 3537 pv c6c4 d6c4 b5c4
Latest version of (new) certainty propagation & autoextending vs. standard lc0:
P1: +181 -152 =667 Win: 51.45% Elo: 10.08 LOS: 94.40% P1-W: +102 -69 =328 P1-B: +79 -83 =339
Parameters: --certainty-prop=1 --auto-extend=1
Match between (new) --certainty-prop=1 and certainty--prop=2 (1000 Games / 800 Visits):
P1: +297 -279 =424 Win: 50.90% Elo: 6.25 LOS: 77.34% P1-W: +167 -130 =203 P1-B: +130 -149 =221
Match between --auto-extend=1 and standard lc0 (10000 games/100 visits per move):
P1: +2974 -2925 =4101 Win: 50.24% Elo: 1.70 LOS: 73.82% P1-W: +1705 -1267 =2028 P1-B: +1269 -1658 =2073