LC0 and MultiPV

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

gordonr
Posts: 226
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

LC0 and MultiPV

Post by gordonr »

Hi,

I'm using the recent 0.30.0 release of LC0. Since I believe LC0 suffers no performance impact of using multiPV, I was running with multiPV set to 5 lines. e.g. this snippet...

Code: Select all

info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 333 nps 7775 tbhits 0 multipv 1 pv c2c4 a3b2 c4c5 b2e5 c5d6 d8d6 f5d6 e5d6 e1d2 f8d8 d1e2 b8c6 d2c3 c7e7 f4d6 e7d6 e2f2 c6b4 c3d4 b4c2 d4d2 c2b4 h4f4 d6e7 g1d1 g8h8 h3h4 g6h5 d1c1 b4c6 d2c3 c6e5 f4f5 e5g4
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 2720 nps 7775 tbhits 0 multipv 2 pv e1c3 c7c3 f5e7 g8h8 e7g6 f7g6 h4h7 h8h7 g1g4 f8f4 g4f4 c3a1 d1e2 a1f1 e2f1 d8f8 f4f8 b8d7 f8f7 d7f6 g5f6 a3b2 f7g7 h7h6 e4e5 b2e5 d5e4 e5f6 g7g6 h6h7 g6f6 h7g7 f6d6
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 156 nps 7775 tbhits 0 multipv 3 pv e1d2 g8h8 d2d4 f8g8 d4a7 b8a6 c2c4 d8d7 a7d4 a6b4 d1e2 b7b5 g1d1 b4d5 c4d5 c7c2 e2f3 a3b4 d4d3 c2a2 d3b5 a2b3 d1d4 b3b1 f5g3 d7a7 d4b4
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 268 nps 7775 tbhits 0 multipv 4 pv h4h7 g8h7 e1h4 h7g8 f5g7 c7c3 g7f5 c3h8 f5h6 g8g7 h4g4 g7h7 h6f7 f8f7 d5f7 d8f8 f7g6 h7g6 g4e6 g6g7 g5g6 h8h5 d1d2 h5h4 g1f1 b8c6 e6d7 c6e7 f4h6 g7h6 f1f8
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 109 nps 7775 tbhits 0 multipv 5 pv g1g2 g8h8 c2c4 b8c6 e1c3 f8g8 g2g4 d8d7 d1e2 a3b4 c3b2 c7a5 g4g1 b4c3 b2c2 c3e5 c2d2 e5c3
bestmove c2c4 ponder a3b2
So, at this point in time, LC0 has found the key move e1c3 and scores it significantly higher than the other moves. But it is still "multipv 2" in the ordering and when I tell LC0 to stop analysing, it doesn't return e1c3 as the "bestmove". It would play c2c4. If I let it analyse longer, e1c3 will eventually rise to "multipv 1" but it can take some significant time. I'm seeing LC0 score e1c3 highly after 3 mins but will only play it after 6 mins.

I was initially observing this in the Arena GUI but also tested with the command line to rule out possible GUI issues.

To just take a step back to how I arrived here, I did initially test LC0 0.30.0 with some test positions and *no* multiPV. And for the test in question, it was consistently taking about 6 mins to solve. I then put multiPV to 5 to see how the eval for the key move changes during the 6 minutes. And then I see it scoring best consistently after only 3 mins. How can it evaluate it correctly after 3 mins of multiPV but it takes 6 mins without multiPV?! Why does it take another search iteration or so for the higher scoring multipv 2 to become multipv 1?!
dkappe
Posts: 1632
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: LC0 and MultiPV

Post by dkappe »

If multipv changes how lc0 selects the next node to be explored, it could distribute visits more broadly. Don’t know if that’s the case now. I’d ask in their discord.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Joerg Oster
Posts: 978
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: LC0 and MultiPV

Post by Joerg Oster »

gordonr wrote: Fri Jul 28, 2023 2:40 am Hi,

I'm using the recent 0.30.0 release of LC0. Since I believe LC0 suffers no performance impact of using multiPV, I was running with multiPV set to 5 lines. e.g. this snippet...

Code: Select all

info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 333 nps 7775 tbhits 0 multipv 1 pv c2c4 a3b2 c4c5 b2e5 c5d6 d8d6 f5d6 e5d6 e1d2 f8d8 d1e2 b8c6 d2c3 c7e7 f4d6 e7d6 e2f2 c6b4 c3d4 b4c2 d4d2 c2b4 h4f4 d6e7 g1d1 g8h8 h3h4 g6h5 d1c1 b4c6 d2c3 c6e5 f4f5 e5g4
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 2720 nps 7775 tbhits 0 multipv 2 pv e1c3 c7c3 f5e7 g8h8 e7g6 f7g6 h4h7 h8h7 g1g4 f8f4 g4f4 c3a1 d1e2 a1f1 e2f1 d8f8 f4f8 b8d7 f8f7 d7f6 g5f6 a3b2 f7g7 h7h6 e4e5 b2e5 d5e4 e5f6 g7g6 h6h7 g6f6 h7g7 f6d6
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 156 nps 7775 tbhits 0 multipv 3 pv e1d2 g8h8 d2d4 f8g8 d4a7 b8a6 c2c4 d8d7 a7d4 a6b4 d1e2 b7b5 g1d1 b4d5 c4d5 c7c2 e2f3 a3b4 d4d3 c2a2 d3b5 a2b3 d1d4 b3b1 f5g3 d7a7 d4b4
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 268 nps 7775 tbhits 0 multipv 4 pv h4h7 g8h7 e1h4 h7g8 f5g7 c7c3 g7f5 c3h8 f5h6 g8g7 h4g4 g7h7 h6f7 f8f7 d5f7 d8f8 f7g6 h7g6 g4e6 g6g7 g5g6 h8h5 d1d2 h5h4 g1f1 b8c6 e6d7 c6e7 f4h6 g7h6 f1f8
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 109 nps 7775 tbhits 0 multipv 5 pv g1g2 g8h8 c2c4 b8c6 e1c3 f8g8 g2g4 d8d7 d1e2 a3b4 c3b2 c7a5 g4g1 b4c3 b2c2 c3e5 c2d2 e5c3
bestmove c2c4 ponder a3b2
So, at this point in time, LC0 has found the key move e1c3 and scores it significantly higher than the other moves. But it is still "multipv 2" in the ordering and when I tell LC0 to stop analysing, it doesn't return e1c3 as the "bestmove". It would play c2c4. If I let it analyse longer, e1c3 will eventually rise to "multipv 1" but it can take some significant time. I'm seeing LC0 score e1c3 highly after 3 mins but will only play it after 6 mins.

I was initially observing this in the Arena GUI but also tested with the command line to rule out possible GUI issues.

To just take a step back to how I arrived here, I did initially test LC0 0.30.0 with some test positions and *no* multiPV. And for the test in question, it was consistently taking about 6 mins to solve. I then put multiPV to 5 to see how the eval for the key move changes during the 6 minutes. And then I see it scoring best consistently after only 3 mins. How can it evaluate it correctly after 3 mins of multiPV but it takes 6 mins without multiPV?! Why does it take another search iteration or so for the higher scoring multipv 2 to become multipv 1?!
That's because they consider the move with the most visits the best one. This is called 'robust child'.
Jörg Oster
gordonr
Posts: 226
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Re: LC0 and MultiPV

Post by gordonr »

Joerg Oster wrote: Fri Jul 28, 2023 11:47 am That's because they consider the move with the most visits the best one. This is called 'robust child'.
Ah, that makes sense. I was only thinking about the evaluation score itself and not how reliable it was. Although with my test set this appeared to cause an unnecessary and long delay - while increasing the number of visits - I can see how in typical play it would be dangerous to promote a move with insufficient visits.

Thanks for enlightening me, and thanks to everyone who replied :)
gordonr
Posts: 226
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Re: LC0 and MultiPV

Post by gordonr »

I'm once again trying to understand some LC0 output and best move choice. From Nibbler log:

Code: Select all

< info string c8c4  (66  ) N:  147974 (+32) (P:  9.13%) (WL:  0.98324) (D: 0.011) (M: 31.3) (Q:  0.98324) (U: 0.00867) (S:  0.99673) (V:  0.9985) 
< info string c8c3  (68  ) N:  330729 (+54) (P: 29.27%) (WL:  0.97918) (D: 0.014) (M: 31.1) (Q:  0.97918) (U: 0.01245) (S:  0.99673) (V:  0.9988) 
So "N" is related to visit count and hence currently c8c3 is the top choice due to the biggest "N" value.

However, the "WL" for c8c4 is higher and I think remains higher while the analysis continues. So why doesn't c8c4 start to catch up and overtake c8c3 in terms of visits (N) if the WL is better?

I see that the policy (P) for c8c3 is better than for c8c4. Ok, so maybe that's the reason?! I know a bit about UCB and MCTS but can anyone explain why the visit count for c8c3 continues to stay above c8c4. Maybe I don't understand "policy" enough or something else.

This wasn't a tactical puzzle that I was analysing. When I've analysed difficult tactical puzzles, I've seen LC0 take some time to find the solution, and then take some time for that key move to increase it's visit count enough to be the top choice - fair enough. But with the position I'm using as an example here, in Nibbler I don't see the eval's in order from best to worst. Ok, the visit count is the order. But why doesn't it change with enough time... why doesn't the visit counts become more in line with the evals? It's like LC0 is saying "I'm not evaluating this move as best but I'm continuing to keep it's visit count highest".

Thanks for any insights.
Uri Blass
Posts: 10876
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: LC0 and MultiPV

Post by Uri Blass »

Joerg Oster wrote: Fri Jul 28, 2023 11:47 am
gordonr wrote: Fri Jul 28, 2023 2:40 am Hi,

I'm using the recent 0.30.0 release of LC0. Since I believe LC0 suffers no performance impact of using multiPV, I was running with multiPV set to 5 lines. e.g. this snippet...

Code: Select all

info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 333 nps 7775 tbhits 0 multipv 1 pv c2c4 a3b2 c4c5 b2e5 c5d6 d8d6 f5d6 e5d6 e1d2 f8d8 d1e2 b8c6 d2c3 c7e7 f4d6 e7d6 e2f2 c6b4 c3d4 b4c2 d4d2 c2b4 h4f4 d6e7 g1d1 g8h8 h3h4 g6h5 d1c1 b4c6 d2c3 c6e5 f4f5 e5g4
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 2720 nps 7775 tbhits 0 multipv 2 pv e1c3 c7c3 f5e7 g8h8 e7g6 f7g6 h4h7 h8h7 g1g4 f8f4 g4f4 c3a1 d1e2 a1f1 e2f1 d8f8 f4f8 b8d7 f8f7 d7f6 g5f6 a3b2 f7g7 h7h6 e4e5 b2e5 d5e4 e5f6 g7g6 h6h7 g6f6 h7g7 f6d6
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 156 nps 7775 tbhits 0 multipv 3 pv e1d2 g8h8 d2d4 f8g8 d4a7 b8a6 c2c4 d8d7 a7d4 a6b4 d1e2 b7b5 g1d1 b4d5 c4d5 c7c2 e2f3 a3b4 d4d3 c2a2 d3b5 a2b3 d1d4 b3b1 f5g3 d7a7 d4b4
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 268 nps 7775 tbhits 0 multipv 4 pv h4h7 g8h7 e1h4 h7g8 f5g7 c7c3 g7f5 c3h8 f5h6 g8g7 h4g4 g7h7 h6f7 f8f7 d5f7 d8f8 f7g6 h7g6 g4e6 g6g7 g5g6 h8h5 d1d2 h5h4 g1f1 b8c6 e6d7 c6e7 f4h6 g7h6 f1f8
info depth 18 seldepth 46 time 232988 nodes 1784778 score cp 109 nps 7775 tbhits 0 multipv 5 pv g1g2 g8h8 c2c4 b8c6 e1c3 f8g8 g2g4 d8d7 d1e2 a3b4 c3b2 c7a5 g4g1 b4c3 b2c2 c3e5 c2d2 e5c3
bestmove c2c4 ponder a3b2
So, at this point in time, LC0 has found the key move e1c3 and scores it significantly higher than the other moves. But it is still "multipv 2" in the ordering and when I tell LC0 to stop analysing, it doesn't return e1c3 as the "bestmove". It would play c2c4. If I let it analyse longer, e1c3 will eventually rise to "multipv 1" but it can take some significant time. I'm seeing LC0 score e1c3 highly after 3 mins but will only play it after 6 mins.

I was initially observing this in the Arena GUI but also tested with the command line to rule out possible GUI issues.

To just take a step back to how I arrived here, I did initially test LC0 0.30.0 with some test positions and *no* multiPV. And for the test in question, it was consistently taking about 6 mins to solve. I then put multiPV to 5 to see how the eval for the key move changes during the 6 minutes. And then I see it scoring best consistently after only 3 mins. How can it evaluate it correctly after 3 mins of multiPV but it takes 6 mins without multiPV?! Why does it take another search iteration or so for the higher scoring multipv 2 to become multipv 1?!
That's because they consider the move with the most visits the best one. This is called 'robust child'.
I wonder if they tested counting visits in a different way so the weight of visits that happen later is bigger and if they found that it does not help to improve playing strrength(for example decide that the weight of the visit that happen in time N is proportional to N or to sqrt(N) or to log(N) or another increasing function of N).