Why all Lc0 runs result in such similarity of quiet moves selection?

Ferdy · Post by **Ferdy** » Fri May 03, 2019 9:06 am

Ferdy wrote: ↑Fri May 03, 2019 6:56 am
Ferdy wrote: ↑Fri May 03, 2019 5:55 am
Laskos wrote: ↑Wed May 01, 2019 6:38 pm I run a bit modified by Adam Hair (hash cleared before each position) Sim03 tester by Don Dailey. The 8,300 positions tested are quiet positions from real games, having several close in value options for the best move. Engines are on one thread (aside Lc0 on 2 threads), time per positions is 100ms. Here are the cross-percentages for matched moves engine-wise (scroll down the box):
I am currently analyzing the sim positions with latest Stockfish dev on multipv 2, contempts are set to 0 and Off run at 1s/pos, 1 thread on 3.4 Ghz intel cpu. Tried to record the difference between the bs1 the score from multipv 1 and bs2 the score from multipv 2 on some score range, to have an idea of non-quiet pos in the set at this movetime.

Results so far:
Code: Select all
Pos: 562, FEN: r1b1rbk1/2q2ppp/pp3n2/3pp3/P3PPP1/2N2B2/1PPQ3P/R4RBK w - - 0 17, diff: +79
diff 200cp or more : 10 (1.8%) of 563 so far
diff 100cp to 199cp: 21 (3.7%) of 563 so far
diff 50cp  to 99cp : 65 (11.5%) of 563 so far
diff 25cp  to 49cp : 123 (21.8%) of 563 so far
After 4010 pos.
Code: Select all
Pos: 4010, FEN: 2k5/1p1r1ppp/pqbBp3/4P3/4pQ2/8/PPP2PPP/1KR5 w - - 7 22, diff: +32
diff 200cp or more : 67 (1.7%) of 4011 so far
diff 100cp to 199cp: 189 (4.7%) of 4011 so far
diff 50cp  to 99cp : 498 (12.4%) of 4011 so far
diff 25cp  to 49cp : 846 (21.1%) of 4011 so far

Final result:

Code: Select all

diff 200cp or more : 146 (1.8%) of 8238
diff 100cp to 199cp: 384 (4.7%) of 8238
diff 50cp  to 99cp : 1109 (13.5%) of 8238
diff 25cp  to 49cp : 1679 (20.4%) of 8238

Laskos · Post by **Laskos** » Fri May 03, 2019 1:28 pm

Ferdy wrote: ↑Fri May 03, 2019 9:06 am
Ferdy wrote: ↑Fri May 03, 2019 6:56 am
Ferdy wrote: ↑Fri May 03, 2019 5:55 am
Laskos wrote: ↑Wed May 01, 2019 6:38 pm I run a bit modified by Adam Hair (hash cleared before each position) Sim03 tester by Don Dailey. The 8,300 positions tested are quiet positions from real games, having several close in value options for the best move. Engines are on one thread (aside Lc0 on 2 threads), time per positions is 100ms. Here are the cross-percentages for matched moves engine-wise (scroll down the box):
I am currently analyzing the sim positions with latest Stockfish dev on multipv 2, contempts are set to 0 and Off run at 1s/pos, 1 thread on 3.4 Ghz intel cpu. Tried to record the difference between the bs1 the score from multipv 1 and bs2 the score from multipv 2 on some score range, to have an idea of non-quiet pos in the set at this movetime.

Results so far:
Code: Select all
Pos: 562, FEN: r1b1rbk1/2q2ppp/pp3n2/3pp3/P3PPP1/2N2B2/1PPQ3P/R4RBK w - - 0 17, diff: +79
diff 200cp or more : 10 (1.8%) of 563 so far
diff 100cp to 199cp: 21 (3.7%) of 563 so far
diff 50cp  to 99cp : 65 (11.5%) of 563 so far
diff 25cp  to 49cp : 123 (21.8%) of 563 so far
After 4010 pos.
Code: Select all
Pos: 4010, FEN: 2k5/1p1r1ppp/pqbBp3/4P3/4pQ2/8/PPP2PPP/1KR5 w - - 7 22, diff: +32
diff 200cp or more : 67 (1.7%) of 4011 so far
diff 100cp to 199cp: 189 (4.7%) of 4011 so far
diff 50cp  to 99cp : 498 (12.4%) of 4011 so far
diff 25cp  to 49cp : 846 (21.1%) of 4011 so far
Final result:
Code: Select all
diff 200cp or more : 146 (1.8%) of 8238
diff 100cp to 199cp: 384 (4.7%) of 8238
diff 50cp  to 99cp : 1109 (13.5%) of 8238
diff 25cp  to 49cp : 1679 (20.4%) of 8238

Thanks, very helpful. So, 60% of positions are multi-move according to SF_dev, and another 20% might be so. These are pretty good numbers for measuring "similarity of move choices". This similarity almost never goes below 30% for normal engines, which is in line with your data. And above 60-65% for unrelated engines, again in line with your numbers. The issue is with perfect chess, on average how many non game-spoiling moves one can make from a position. With TBs we have some 3-4-5-6 men answers, but it's not clear how many such moves are usually occurring throughout the whole game from each of the positions.

Laskos · Post by **Laskos** » Fri May 03, 2019 2:10 pm

Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.

Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

Uri Blass · Post by **Uri Blass** » Fri May 03, 2019 4:20 pm

Laskos wrote: ↑Fri May 03, 2019 2:10 pm
Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.
Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

I believe that many chess positions(and maybe most chess positions that you get in games) have more than 2 equivalent best moves in the meaning of theoretical result so maybe an engine that play a random perfect move is not going to have even 50% similiarity with stockfish or lc0.

Playing random best moves is not a good strategy because your drawing moves may allow even weak players to get a draw easily.

Guenther · Post by **Guenther** » Fri May 03, 2019 4:52 pm

Uri Blass wrote: ↑Fri May 03, 2019 4:20 pm
Laskos wrote: ↑Fri May 03, 2019 2:10 pm
Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.
Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

I believe that many chess positions(and maybe most chess positions that you get in games) have more than 2 equivalent best moves in the meaning of theoretical result so maybe an engine that play a random perfect move is not going to have even 50% similiarity with stockfish or lc0.

Playing random best moves is not a good strategy because your drawing moves may allow even weak players to get a draw easily.

I agree with Uri. I think it would be better to improve the positions instead of trying to interpret the lc0 stats on the
10(?) years old original simtest postions.
There should be bazillions of late opening and middlegame positions with at least 2-5 equally good moves.

Laskos · Post by **Laskos** » Fri May 03, 2019 5:20 pm

Uri Blass wrote: ↑Fri May 03, 2019 4:20 pm
Laskos wrote: ↑Fri May 03, 2019 2:10 pm
Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.
Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

I believe that many chess positions(and maybe most chess positions that you get in games) have more than 2 equivalent best moves in the meaning of theoretical result so maybe an engine that play a random perfect move is not going to have even 50% similiarity with stockfish or lc0.

I am not sure about that. "Maybe most chess positions that you get in ganes" are also those selected for Sim tester, and even more, the tester positions have a higher than usual degree of "variability of options". What is the reason Lc0 gets 40-50% matches to almost all good regular engines, and when I put Sotckfish to think more, the match is beyond 55%, which is high even among regular engines? For regular engines it can be argued that the similarity is unavoidable by the common basic algorithms and heuristics. I am not sure what the reason would be with Lc0, aside that there are also a plethora of unique positions. What heuristics is common there for them to pick the same non-unique position? Move ordering? All across Lc0 and other engines?

Playing random best moves is not a good strategy because your drawing moves may allow even weak players to get a draw easily.

Yes, but that's another topic, and there is hardly any swindling and trolling to improve the outcome engine there, AFAIK. Even TB usage in all (?) engines is hardly swindling in any clever way.

Laskos · Post by **Laskos** » Fri May 03, 2019 5:39 pm

Guenther wrote: ↑Fri May 03, 2019 4:52 pm
Uri Blass wrote: ↑Fri May 03, 2019 4:20 pm
Laskos wrote: ↑Fri May 03, 2019 2:10 pm
Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.
Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

I believe that many chess positions(and maybe most chess positions that you get in games) have more than 2 equivalent best moves in the meaning of theoretical result so maybe an engine that play a random perfect move is not going to have even 50% similiarity with stockfish or lc0.

Playing random best moves is not a good strategy because your drawing moves may allow even weak players to get a draw easily.
I agree with Uri. I think it would be better to improve the positions instead of trying to interpret the lc0 stats on the
10(?) years old original simtest postions.
There should be bazillions of late opening and middlegame positions with at least 2-5 equally good moves.

"Equally good moves" according to whom? Several months ago I did the following: From 2moves_v1.epd I selected the positions where SF10 considered them at some 1-2s/position to be within [-0.05, 0.05] eval. Then I analyzed the selection with a good Lc0 net for a similar time/position. It resulted in a new Gaussian of evals (for SF it was a narrow rectangle around 0.00), with a standard deviation of about 40cp IIRC (0.40). Almost as bad as the untrimmed 2moves-v1.epd Gaussian itself.

Ferdy had shown that you and Uri are surely partly right, 20-40% (at most) of positions being unfit to qualify as having multiple "equally good moves", but this is a controllable noise. It shows up mostly in the form that usually we have some 35-45% matches for unrelated top engines instead of probably more desirable say 20%. But the sensitivity of the Sim tester is affected only mildly. I do think that the issue is only slightly modifying the outcomes, but the meaning of outcomes and conclusions should stay pretty well.

If someone comes up with a new Sim having some 10,000 better positions, I would be glad to check my speculations I wrote above.

Guenther · Post by **Guenther** » Fri May 03, 2019 6:09 pm

Laskos wrote: ↑Fri May 03, 2019 5:39 pm
Guenther wrote: ↑Fri May 03, 2019 4:52 pm
Uri Blass wrote: ↑Fri May 03, 2019 4:20 pm
Laskos wrote: ↑Fri May 03, 2019 2:10 pm
Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.
Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

I believe that many chess positions(and maybe most chess positions that you get in games) have more than 2 equivalent best moves in the meaning of theoretical result so maybe an engine that play a random perfect move is not going to have even 50% similiarity with stockfish or lc0.

Playing random best moves is not a good strategy because your drawing moves may allow even weak players to get a draw easily.
I agree with Uri. I think it would be better to improve the positions instead of trying to interpret the lc0 stats on the
10(?) years old original simtest postions.
There should be bazillions of late opening and middlegame positions with at least 2-5 equally good moves.
"Equally good moves" according to whom? Several months ago I did the following: From 2moves_v1.epd I selected the positions where SF10 considered them at some 1-2s/position to be within [-0.05, 0.05] eval. Then I analyzed the selection with a good Lc0 net for a similar time/position. It resulted in a new Gaussian of evals (for SF it was a narrow rectangle around 0.00), with a standard deviation of about 40cp IIRC (0.40). Almost as bad as the untrimmed 2moves-v1.epd Gaussian itself.

Ferdy had shown that you and Uri are surely partly right, 20-40% (at most) of positions being unfit to qualify as having multiple "equally good moves", but this is a controllable noise. It shows up mostly in the form that usually we have some 35-45% matches for unrelated top engines instead of probably more desirable say 20%. But the sensitivity of the Sim tester is affected only mildly. I do think that the issue is only slightly modifying the outcomes, but the meaning of outcomes and conclusions should stay pretty well.

If someone comes up with a new Sim having some 10,000 better positions, I would be glad to check my speculations I wrote above.

I want to add that it is still possible that you get similar results somehow regarding LC0 nets.
(it is just a bit disappointing that the positions in the test are not that quiet as expected nowadays and have so much single or two best moves)

OTH someone at the LC0 discord chat said it could be also that the NNs, which are built by a kind of generic process, might always
diverge to similar results, if enough training steps are done and that at 100ms it practically plays almost the move the policy head
favours.

chrisw · Post by **chrisw** » Fri May 03, 2019 7:37 pm

Laskos wrote: ↑Wed May 01, 2019 6:38 pm I run a bit modified by Adam Hair (hash cleared before each position) Sim03 tester by Don Dailey. The 8,300 positions tested are quiet positions from real games, having several close in value options for the best move. Engines are on one thread (aside Lc0 on 2 threads), time per positions is 100ms. Here are the cross-percentages for matched moves engine-wise (scroll down the box):
Code: Select all
sim

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 1.0)
  2) Ethereal 11.25 (time: 100 ms  scale: 1.0)
  3) Fire 7.1 (time: 100 ms  scale: 1.0)
  4) Fruit 2.1 (time: 100 ms  scale: 1.0)
  5) Komodo 12.3 (time: 100 ms  scale: 1.0)
  6) Lc0 11261 (time: 100 ms  scale: 1.0)
  7) Lc0 32930 (time: 100 ms  scale: 1.0)
  8) Lc0 42184 (time: 100 ms  scale: 1.0)
  9) Senpai 1.0 (time: 100 ms  scale: 1.0)
 10) SF 10 (time: 100 ms  scale: 1.0)
 11) SF 8 (time: 100 ms  scale: 1.0)
 12) SF dev (time: 100 ms  scale: 1.0)

         1     2     3     4     5     6     7     8     9    10    11    12
  1.  ----- 49.19 45.69 37.95 48.17 44.90 43.65 44.22 46.88 50.36 52.22 49.93
  2.  49.19 ----- 48.05 39.58 48.57 47.14 45.29 45.76 48.66 52.15 52.48 52.09
  3.  45.69 48.05 ----- 40.17 46.43 43.41 42.23 43.12 45.35 48.36 50.24 47.69
  4.  37.95 39.58 40.17 ----- 39.51 36.34 35.54 35.72 46.55 37.81 39.88 37.50
  5.  48.17 48.57 46.43 39.51 ----- 45.82 44.54 45.21 48.28 50.10 51.18 50.15
  6.  44.90 47.14 43.41 36.34 45.82 ----- 71.28 71.04 42.90 49.92 47.74 49.78
  7.  43.65 45.29 42.23 35.54 44.54 71.28 ----- 74.81 42.11 48.82 46.88 48.09
  8.  44.22 45.76 43.12 35.72 45.21 71.04 74.81 ----- 42.95 49.34 47.44 48.94
  9.  46.88 48.66 45.35 46.55 48.28 42.90 42.11 42.95 ----- 46.42 48.07 46.56
 10.  50.36 52.15 48.36 37.81 50.10 49.92 48.82 49.34 46.42 ----- 58.76 63.17
 11.  52.22 52.48 50.24 39.88 51.18 47.74 46.88 47.44 48.07 58.76 ----- 57.13
 12.  49.93 52.09 47.69 37.50 50.15 49.78 48.09 48.94 46.56 63.17 57.13 -----
And the dendrogram of similarity in moves selection:

Lc0_Dendrogram.jpg

We see from both the matrix and the dendrogram that all 3 different Lc0 runs are so closely related in this Sim test, that they are MUCH closer one to another than SF_dev to SF10. I was expecting a quite different picture. Even inside the same run, there are many "drift areas" in the "optima landscape" for NN values, many local "optima" and many possibilities to reach some run-dependent more general optimum. So, I expected even inside the same run to find many dissimilarities between the NN nets, never mind very different runs. But I do not understand well this stuff.

Also, we know that positionally (on this Sim 8,300 quiet positions too) Lc0 late nets of a particular run are VERY strong. Is there a possibility that the evals of these different NNs converge to some common choices for different runs just because objectively stronger moves are quite unique even in quiet positions? And the sheer strength makes the nets convergent?
Another possibility is that all three runs are almost identic, varying only in irrelevant details.
And another is that all zero or quasi-zero runs give similar move-selection nets, at least positionally. That would mean that Lc0 must be very similar to Alpha0 positionally.

A note: observe how Stockfishes cluster together, some other new engines not far away, and 2 engines by Fabien Letouzey, Fruit 2.1 and Senpai 1.0 cluster separately together (although they are not closely related). Also, all Lc0s are very unrelated to other engines (but extremely related among themselves).

Not sure if I’m answering your question or what I read in your question, but ...

Once LC0 algo has decided on some root move because that root move has the highest visit count, it takes a lot of visits for an alternative move to overtake it. if the alternative move is a tactic that only appears with deep search, let’s say highest move visit count is 10000 and potential move only got 100 visits at the point the MCTS search sniffed something and starts with increasing visits to alternate move, then it has long way to go to overtake.

Contrast AB, it sniffs something and the ensuing failhigh at root is relatively quick.
So, for LC0, apart from taking quite a while for a tactical deep shot to show up, it’s not really an algorithm designed for solving test positions. It does good positional play instead.

Guenther · Post by **Guenther** » Fri May 03, 2019 8:05 pm

Guenther wrote: ↑Fri May 03, 2019 6:09 pm
I want to add that it is still possible that you get similar results somehow regarding LC0 nets.
(it is just a bit disappointing that the positions in the test are not that quiet as expected nowadays and have so much single or two best moves)

OTH someone at the LC0 discord chat said it could be also that the NNs, which are built by a kind of generic process, might always
converge to similar results, if enough training steps are done and that at 100ms it practically plays almost the move the policy head
favours.

Edit:
oops, of course I meant converge instead of diverge.

Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?