MonteCarlo wrote:Thanks for the update Kai!
On the one hand, it's quite possible that a fundamental change to its MCTS implementation will be required at some point if it wants to compete at the highest level, and the work Daniel Shawul has done with Scorpio could prove quite useful in that case (well, it's fantastic work in any case; it's just in this case that it would benefit LC0
).
On the other hand, unless you subscribe to some form of conspiracy theory around the A0 results, we're nowhere near the limits of this sort of approach, so I wouldn't worry too much about that just yet.
Right now there are still a bunch of bugs being worked out, the network is still rather small, and the project is rather young (barely a month old, and it's barely been a week since the last major bug was discovered and fixed).
Some patience is required. It might turn out that switching to a new implementation of MCTS is required; it might also turn out that the NN at some level gives good enough prior probabilities for moves that even MCTS with averaging is good tactically.
We'll just have to give it some time
First I replayed with another pool engines the gauntlets (40 games each), with even more stable, stronger and well tested on CCRL engines, GreKo 6.5 (2336 CCRL Elo) and Cheese 1.2 (2330 CCRL Elo).
LC0 ID83 CPU 4 threads i7 Haswell.
At
1s/move:
10.0/40
Performance
2140 CCRL Elo points
At
10s/move:
17.0/40
Performance
2280 CCRL Elo points
So, previous results are confirmed, and the scaling seems good.
Then I had a look at the 10s/move games. First observation is that as soon as it gets tactical, LC0 blunders more often. When the positions are quiet, and the progress is slow, LC0 seems to overplay even this pretty strong opposition. In tactical endgames, LC0 might blunder grossly. Here is an example:
[D]3R2rk/4P3/4Q1p1/p6p/6K1/2q5/8/8 w - - 0 136
LCO is white. GreKo 6.5 just moved 135...h5+, but the position is completely won by LC0. Analysis by Stockfish 9:
Code: Select all
136.Kf4 Qc1+ 137.Qe3 Qc4+ 138.Qd4+ Qxd4+ 139.Rxd4 Kg7 140.Rd8 g5+ 141.Kf5 Re8 142.Rxe8 Kf7 143.Rg8 Kxe7 144.Rxg5 Kd6 145.Kf4 a4 146.Rxh5 Kc7 147.Ra5 Kc6 148.Rxa4 Kc5 149.Ra5+ Kd4 150.Kf3 Kc4 151.Ke4 Kb4 152.Re5 Kc3 153.Ke3 Kc4 154.Rh5 Kc3 155.Rc5+ Kb4 156.Kd4 Kb3 157.Kd3 Kb4 158.Re5 Kb3 159.Rb5+ Ka4 160.Kc4 Ka3 161.Kc3 Ka2 162.Ra5+ Kb1 163.Ra7 Kc1 164.Ra1#
White mates: +- (#29) Depth: 49/58 00:00:22 270MN, tb=5623189
SF9 needs less than 0.01s and less than 10,000 nodes to see the correct move (out of 3 legal moves) --- Kf4, with White's large advantage (it then sees the White Mate in one second or so).
LC0 here blundered and lost the game quickly, at 10s/move. It played Kh4, which is... Mate in
4 for Black! Out of only 3 legal moves.
I then let analyze LCO (ID83) on 4 threads for 60 seconds:
Code: Select all
lczero.exe -n -w latest.txt -p 0 --noponder --threads 4
Using 4 thread(s).
Generated 1924 moves
Detecting residual layers...v1...64 channels...6 blocks.
BLAS Core: Haswell
position fen 3R2rk/4P3/4Q1p1/p6p/6K1/2q5/8/8 w - - 0 136
go movetime 60000
info depth 7 nodes 3 nps 154 score cp -146 winrate 16.67% time 12 pv g4h4 g8d8
info depth 9 nodes 11 nps 476 score cp -68 winrate 32.01% time 20 pv g4g5 c3g7 d8g8
info depth 10 nodes 23 nps 759 score cp -11 winrate 46.85% time 28 pv g4g5 c3g3e6g4 h5g4
info depth 11 nodes 31 nps 789 score cp 13 winrate 53.70% time 37 pv g4g5 c3g3 g5f6 g8d8 e7d8q
info depth 12 nodes 55 nps 1000 score cp 31 winrate 58.57% time 53 pv g4g5 c3g3g5f6 g3f4 e6f5 f4f5
info depth 13 nodes 101 nps 1163 score cp 68 winrate 67.98% time 85 pv g4f4 g8d8 e7d8q h8g7 d8g8 g7h6 e6g6
info depth 14 nodes 185 nps 1203 score cp 110 winrate 77.06% time 152 pv g4f4 g8d8 e7d8q h8g7 d8g8 g7h6 e6g6
info depth 15 nodes 356 nps 1286 score cp 122 winrate 79.41% time 275 pv g4f4 c3c7 f4g5 g8d8 e7d8q c7d8 g5g6 d8g8 g6h5 g8e6
info depth 16 nodes 709 nps 1487 score cp 154 winrate 84.56% time 475 pv g4h4 c3g7 d8g8 g7g8 e6g8 h8g8 e7e8q g8g7 e8e7 g7h6 e7f6 a5a4 f6a6
info depth 17 nodes 1279 nps 1555 score cp 180 winrate 87.95% time 821 pv g4h4 c3g7 d8g8 g7g8 e6g8 h8g8 e7e8q g8g7 e8e7 g7h6 e7f6 h6h7 h4g5
info depth 18 nodes 2428 nps 1580 score cp 193 winrate 89.39% time 1535 pv g4h4 c3b4 h4g5 b4g4 e6g4 h5g4 d8g8 h8g8 e7e8q g8g7 e8e7 g7g8 g5g6 g4g3 e7e8
info depth 19 nodes 5262 nps 1931 score cp 226 winrate 92.32% time 2723 pv g4h4 c3b4 h4g5 b4g4 e6g4 h5g4 d8g8 h8g8 e7e8q g8h7 e8f7 h7h8 g5g6 g4g3 f7h7
info depth 20 nodes 9967 nps 2236 score cp 241 winrate 93.44% time 4456 pv g4h4 c3b4 h4g5 b4g4 e6g4 h5g4 d8g8 h8g8 e7e8q g8h7 e8e7 h7g8 g5g6 g4g3 e7e8
info depth 21 nodes 19755 nps 2652 score cp 261 winrate 94.68% time 7448 pv g4h4 c3g7 d8g8 g7g8 e7e8q g8e8 e6e8 h8g7 h4g5 h5h4 e8e7 g7g8 g5g6 h4h3 e7e8
info depth 22 nodes 38930 nps 3225 score cp 275 winrate 95.37% time 12071 pv g4h4 c3g7 d8g8 g7g8 e7e8r g8e8 e6e8 h8g7 h4g5 h5h4 e8e7 g7g8 g5g6 h4h3 e7e8
info depth 23 nodes 63572 nps 3072 score cp 259 winrate 94.55% time 20693 pv g4h4 c3b4 h4g3 b4c3 g3h4 c3g7 d8g8 g7g8 e7e8q g8e8 e6e8 h8g7 e8e7 g7h6 e7f6 a5a4 f6 h8
info depth 24 nodes 107476 nps 3275 score cp 260 winrate 94.60% time 32814 pv g4h4 c3b4 h4g3 b4g4 e6g4 h5g4 d8g8 h8g8 e7e8q g8g7 g3g4 g7f6 g4f4 g6g5 f4g4 a5a4 e8a4 f6e6 a4a6 e6e5
info depth 25 nodes 192280 nps 4112 score cp 261 winrate 94.66% time 46761 pv g4h4 c3b4 h4g3 b4a3 g3h2 a3b2 h2h3 b2g7 d8g8 g7g8 e7e8r g8e8 e6e8 h8g7 e8e7 g7h6 e7a7 h6g5 a7a5 g5f4 a5e1 g6g5 e1f1
g4h4 -> 268306 (V: 94.82%) (N: 17.28%) PV: g4h4 c3b4 h4g3 b4a3 g3h2 a3b2 h2h3 b2g7 d8g8 g7g8 e7e8r g8e8 e6e8 h8g7 e8e7 g7h6 e7a7 h6g5 a7a5 g5f4 a5e1 g6g5 e1f1 f4e3
g4f4 -> 3491 (V: 89.91%) (N: 38.80%) PV: g4f4 c3c7 e6e5 c7e5 f4e5 h5h4 d8g8 h8g8 e5f6 h4h3 e7e8q g8h7 e8g6 h7h8 g6h6 h8g8 h6h3 a5a4 h3a3 g8h7 a3a4 h7h8 a4h4
g4g5 -> 231 (V: 10.77%) (N: 43.91%) PV: g4g5 c3g3 g5f6 g3f4 e6f5 f4f5
info depth 25 nodes 272029 nps 4536 score cp 262 winrate 94.68% time 59967 pv g4h4 c3b4 h4g3 b4a3 g3h2 a3b2 h2h3 b2g7 d8g8 g7g8 e7e8r g8e8 e6e8 h8g7 e8e7 g7h6 e7a7 h6g5 a7a5 g5f4 a5e1 g6g5 e1f1 f4e3
bestmove g4h4
Only after some 1 million playouts it switches to the winning Kf4. I don't know how these MCTS rollouts work here, but my experience with MCTS Go engines, especially the newer Leela or Crazy Stone is different. Tactically this blunder (from won position to -M4) is equivalent to losing an elementary ladder in Go, a thing which for years doesn't happen with strong MCTS Go engines.