How to get the "usual" Multi PV with MCTS engines?

Ferdy · Post by **Ferdy** » Tue May 21, 2019 6:37 pm

chrisw wrote: ↑Tue May 21, 2019 4:26 pm
Ferdy wrote: ↑Tue May 21, 2019 4:02 pm
Laskos wrote: ↑Tue May 21, 2019 12:13 pm
chrisw wrote: ↑Tue May 21, 2019 11:13 am
Laskos wrote: ↑Tue May 21, 2019 10:39 am I want already to rewrite the basis of the opening theory using Lc0, being an 1700 Elo player .

Now, a bit more seriously, I am confident that late nets of the T30 and T40 runs are VERY strong in the openings (provably MUCH stronger than any standard AB engine). So, I let Lc0 21.2rc1 with a good late net ID42361 to analyze the starting opening position for as long as my RAM allows (in fact in the first run my machine crashed). My OC-ed RTX 2070 quickly (~15 min) fills all available RAM on my 16GB RAM machine. I set MultiPV = 20 for Lc0. Then I compared the output (after some 30 million nodes searched) to the human FIDE Elo above 2200 statistic from the "Chess Tempo" site. The comparison is here:

=
Initial_position_H_Lc0.jpg

=

Many curious and interesting things can be said from these numbers, but for now I want to focus on say last 12 of possible opening moves. The statistic of strong human players becomes weak...but so is the MCTS tree search there even in MultiPV mode with both Lc0 and Komodo MCTS. The search tree is shallow and narrow on these moves and they have few visits. With AB engines and usual MultiPV, I don't have such worries about the weaker moves. More or less, all are explored to a similar degree. So, with MCTS, if I really want to see how say a4 move in the opening shows itself, I do have to play it, right? Is there a way to have a "regular" MultiPV (similar to that of AB engines) with MCTS engines?
Assuming you don’t want to actually attack the source code and recompile ...
Modify the PUCT to give more width, less depth, but that will basically destroy the effectiveness of the search.
Play each move and then search (you already suggested that).
Write a little program in Python to launch several instances of LC0, spreading the available RAM and GPU between them, running in parallel, each one handling one root move.
Write a little Python program that serially runs each root move.

Probably what you would really want, is a recompiled LC0 source that allowed you to remove moves from the root move list. Then the search could concentrate on what remained. That would be relatively easy for Crem or someone to do, you would just pass in a command with a list of skip moves (or a list of do moves).
Thanks, I think I have to content myself with serially running the positions, maybe using Python. I had bad experiences with spreading GPU on several parallel instances of Lc0, they are GPU hungry and seem unstable or slowing down too much when in parallel. Guenther's UCI option would be useful if it will be available.
What PUCT and/or other parameters would give equal distribution of visited moves (equal distribution policy head, right?)? I am not interested in the quality of the chosen best move, but in the quality of several or all possible moves.
Without revising the code, I think your idea of moving the candidate move on the board and then search to a fix time or nodes or depth, get the score, invert the score sign, is a good idea.
That sounds about right.
An automated extension of that idea would be, via a little bit of Python:
Carry out a normal search for N nodes.
Get the searched moves and each visit count and each win rate.
Then, for each move, except the first (most visited), play it on board and search for X nodes, where X is the difference between visits[0 ] and visits[ i]. Get score, invert as above.
Then score(i) = mean score of the two scores weighted by their visits.

Does that sound right?

I just make each legal move on the board from startpos.

Set Lc0 to return win_percent as this is the appropriate target output instead of cp.
setoption name scoretype value win_percentage

move e2e4 on the board.

(Thread-7 ) <UciProtocol (pid=3140)>: << position startpos moves e2e4
(Thread-7 ) <UciProtocol (pid=3140)>: << go movetime 1000
(Thread-7 ) <UciProtocol (pid=3140)>: >> info depth 5 seldepth 10 time 599 nodes 170 score cp 4877 hashfull 30 nps 283 tbhits 0 pv c7c5
(Thread-7 ) <UciProtocol (pid=3140)>: >> bestmove c7c5 ponder b1c3

win_percentage of e2e4 is (10000 - 4877)/100 = 51.23

Result would look like this.

Code: Select all

Engine: Lc0 v0.21.2-rc1 w11258-80x7-se blas
Time(s)/move: 1.0

No.  Move  WinPercent
01.  c2c4  51.62
02.  d2d4  51.59
03.  e2e4  51.22
04.  g2g3  51.17
05.  g1f3  51.10
06.  e2e3  51.03
07.  b1c3  50.16
08.  c2c3  50.11
09.  a2a3  48.99
10.  d2d3  48.71
11.  b2b3  48.66
12.  h2h3  48.58
13.  a2a4  48.40
14.  f2f4  46.92
15.  b2b4  45.86
16.  h2h4  45.69
17.  b1a3  44.24
18.  g1h3  42.88
19.  f2f3  42.76
20.  g2g4  38.82

Another approach is to get the Q value via.
setoption name scoretype value Q

Then get the cp value and win_percent via.

Code: Select all

if (score_type == "centipawn")
    uci_info.score = 111.714640912 * tan(1.5620688421 * edge.GetQ(default_q));

and

Code: Select all

if (score_type == "win_percentage")
    uci_info.score = edge.GetQ(default_q) * 5000 + 5000;

https://github.com/LeelaChessZero/lc0/b ... /search.cc

Laskos · Post by **Laskos** » Thu May 23, 2019 10:07 am

mjlef wrote: ↑Tue May 21, 2019 4:25 pm
Laskos wrote: ↑Tue May 21, 2019 10:39 am I want already to rewrite the basis of the opening theory using Lc0, being an 1700 Elo player .

Now, a bit more seriously, I am confident that late nets of the T30 and T40 runs are VERY strong in the openings (provably MUCH stronger than any standard AB engine). So, I let Lc0 21.2rc1 with a good late net ID42361 to analyze the starting opening position for as long as my RAM allows (in fact in the first run my machine crashed). My OC-ed RTX 2070 quickly (~15 min) fills all available RAM on my 16GB RAM machine. I set MultiPV = 20 for Lc0. Then I compared the output (after some 30 million nodes searched) to the human FIDE Elo above 2200 statistic from the "Chess Tempo" site. The comparison is here:

=
Initial_position_H_Lc0.jpg

=

Many curious and interesting things can be said from these numbers, but for now I want to focus on say last 12 of possible opening moves. The statistic of strong human players becomes weak...but so is the MCTS tree search there even in MultiPV mode with both Lc0 and Komodo MCTS. The search tree is shallow and narrow on these moves and they have few visits. With AB engines and usual MultiPV, I don't have such worries about the weaker moves. More or less, all are explored to a similar degree. So, with MCTS, if I really want to see how say a4 move in the opening shows itself, I do have to play it, right? Is there a way to have a "regular" MultiPV (similar to that of AB engines) with MCTS engines?
neat stuff. I could add a mode to Komodo so that when in MultiPV mode, it would simply visit all root nodes evenly. This would force it to Explore them all equally. But this will greatly reduce the overall depth of the best lines (and lengthen those of the not good lines). I would not expect it to do very well, but could be an interesting experiment.

Yes, as an option, I would like to see this in Komodo MCTS, beside the current MultiPV, which is also very useful. It surely will degrade the strength of the best move, the same as MultiPV is doing in AB engines. But sometimes, especially in quiet positions, I would like to see several or many moves equally well analyzed.

I performed a small experiment to see how the policy and value heads change things in Leela on very quiet initial position, and to see how 10 million nodes each position (analysed from Black POV and inverting the score), so a total of 20 x 10 = 200 million nodes analysis, compares to my initial 30 million nodes MultiPV analysis. Then I analyzed the 20 positions with nodes=1, to see how the naked evaluation of Leela fares. Also, I took regular AB Komodo, and similarly put it analyze to depth=1 the 20 positions (black POV, inverting the score), and then Komodo AB MultiPV to depth 23 (about 2 minutes on 1 core). I set Contempt=0 for Komodo, but the scores seem a bit asymmetric, do you have some asymmetric evaluations? Here is the table (evals are in cp units):

Code: Select all

========================================================================================
           30 million    10 million       1 node        Komodo depth=1    Komodo MultiPV
            MultiPV     each position   each position   each position       depth = 23
d2d4           13            13             20                4                 36
Ng1f3          13            11             19              -17                 19
e2e4           13            15             28                8                 34
c2c4           11             9             21              -58                 15
g2g3            9             5             15              -64                  1
e2e3            9             6             18               11                 22
c2c3            4             4              6              -61                  3
Nb1c3           2             0              6              -29                 15
a2a3            0            -2              5              -72                 17
d2d3            0            -2              2              -17                 -2
b2b3           -1             0              6              -66                -21
h2h3           -4            -3              2              -78                 -1
a2a4           -8           -10             -2              -66                 -5
f2f4          -14           -16            -18             -131                -10
b2b4          -16           -15            -13              -76                -22
h2h4          -25           -25            -14              -62                -21
Ng1h3         -29           -25            -28              -73                -43
Nb1a3         -30           -29            -30              -80                -48
f2f3          -50           -42            -45             -127                -45
g2g4          -94           -80            -75              -71                -76
========================================================================================

It strikes that even 1 node eval of Lc0 shows pretty sensible numbers. The ordering of moves is very close for 30 million nodes MultiPV mode, 10 million nodes each of the 20 positions, and 1 node for each of the 20 positions. OTOH, Komodo depth=1 has some quirks, it prefers for some reason e2e3 and dislikes very much c2c4, and so on. The 2 minutes AB Komodo's search for MultiPV depth=23 gives sensible results, but it still seems to have some quirks --- e2e3 is still above regular openings like c2c4 and Ng1f3, and even a2a3 is preferred to c2c4.

I guess 2 kinds of MultiPV would be useful for MCTS Komodo, as its "naked" eval (having very few visits in current MultiPV mode) might often give some distortions.

But Lc0 too has issues. The preferred move in 30 million nodes MultiPV search is d2d4. In 10 million nodes per position (from Black POV, inverting the score) the preferred move is e2e4. Which one is to trust more for the best move --- 30 million nodes regular search or 20x10 = 200 million nodes uniform search on all possible moves?

Laskos · Post by **Laskos** » Thu May 23, 2019 1:06 pm

Ferdy wrote: ↑Tue May 21, 2019 4:02 pm
Laskos wrote: ↑Tue May 21, 2019 12:13 pm
chrisw wrote: ↑Tue May 21, 2019 11:13 am
Laskos wrote: ↑Tue May 21, 2019 10:39 am I want already to rewrite the basis of the opening theory using Lc0, being an 1700 Elo player .

Now, a bit more seriously, I am confident that late nets of the T30 and T40 runs are VERY strong in the openings (provably MUCH stronger than any standard AB engine). So, I let Lc0 21.2rc1 with a good late net ID42361 to analyze the starting opening position for as long as my RAM allows (in fact in the first run my machine crashed). My OC-ed RTX 2070 quickly (~15 min) fills all available RAM on my 16GB RAM machine. I set MultiPV = 20 for Lc0. Then I compared the output (after some 30 million nodes searched) to the human FIDE Elo above 2200 statistic from the "Chess Tempo" site. The comparison is here:

=
Initial_position_H_Lc0.jpg

=

Many curious and interesting things can be said from these numbers, but for now I want to focus on say last 12 of possible opening moves. The statistic of strong human players becomes weak...but so is the MCTS tree search there even in MultiPV mode with both Lc0 and Komodo MCTS. The search tree is shallow and narrow on these moves and they have few visits. With AB engines and usual MultiPV, I don't have such worries about the weaker moves. More or less, all are explored to a similar degree. So, with MCTS, if I really want to see how say a4 move in the opening shows itself, I do have to play it, right? Is there a way to have a "regular" MultiPV (similar to that of AB engines) with MCTS engines?
Assuming you don’t want to actually attack the source code and recompile ...
Modify the PUCT to give more width, less depth, but that will basically destroy the effectiveness of the search.
Play each move and then search (you already suggested that).
Write a little program in Python to launch several instances of LC0, spreading the available RAM and GPU between them, running in parallel, each one handling one root move.
Write a little Python program that serially runs each root move.

Probably what you would really want, is a recompiled LC0 source that allowed you to remove moves from the root move list. Then the search could concentrate on what remained. That would be relatively easy for Crem or someone to do, you would just pass in a command with a list of skip moves (or a list of do moves).
Thanks, I think I have to content myself with serially running the positions, maybe using Python. I had bad experiences with spreading GPU on several parallel instances of Lc0, they are GPU hungry and seem unstable or slowing down too much when in parallel. Guenther's UCI option would be useful if it will be available.
What PUCT and/or other parameters would give equal distribution of visited moves (equal distribution policy head, right?)? I am not interested in the quality of the chosen best move, but in the quality of several or all possible moves.
Without revising the code, I think your idea of moving the candidate move on the board and then search to a fix time or nodes or depth, get the score, invert the score sign, is a good idea.

Yes, that's what I am doing. I haven't used IDeA for years, but wasn't there, when building a tree, the "usual" MultiPV used heavily? If that is the case, how do you automatically build a tree with such MCTS MultiPV?

How to get the "usual" Multi PV with MCTS engines?

Re: How to get the "usual" Multi PV with MCTS engines?

Re: How to get the "usual" Multi PV with MCTS engines?

Re: How to get the "usual" Multi PV with MCTS engines?