Why all Lc0 runs result in such similarity of quiet moves selection?

Laskos · Post by **Laskos** » Fri May 03, 2019 8:42 pm

Guenther wrote: ↑Fri May 03, 2019 6:09 pm
Laskos wrote: ↑Fri May 03, 2019 5:39 pm
Guenther wrote: ↑Fri May 03, 2019 4:52 pm
Uri Blass wrote: ↑Fri May 03, 2019 4:20 pm
Laskos wrote: ↑Fri May 03, 2019 2:10 pm
Uri Blass wrote: ↑Fri May 03, 2019 7:02 am
A position has a unique best move only if there is only one winning move or only one move to save the game.
It means that the position is not quiet based on my understanding.
Sure, "quiet" or having multiple possible almost equivalent moves according to say Stockfish. It still might have a unique best move according to perfect chess. Still, as (positional) strength goes up, the similarity will increase, although not all the moves are unique according to perfect chess (tablebases).

Say, 60% of positions have an unique move according to TBs, 40% have 2 equivalent moves according to TBs.

Two different perfect players will have
60 + 40/2 = 80% similarity on average.
The similarity between a non-perfect player, say solving 70% of unique moves and 80% of 2-moves positions, with a perfect player is
0.7*60 + 0.8*40/2 = 58% similarity on average.

So, as an engine gains (positional) strength, its similarity with perfect player increases, but never reaches anywhere close to 100% values.
What I see with SF_dev is that it seems to climb up with time control in similarity to Lc0, as though Lc0 is some guru pretending to be a perfect player. I am not sure what that means and what Lc0 is up to in its high similarity across the runs.
Observe also from the data that SF_dev distances itself from other "normal" engines, when giving it much more time per position.

I believe that many chess positions(and maybe most chess positions that you get in games) have more than 2 equivalent best moves in the meaning of theoretical result so maybe an engine that play a random perfect move is not going to have even 50% similiarity with stockfish or lc0.

Playing random best moves is not a good strategy because your drawing moves may allow even weak players to get a draw easily.
I agree with Uri. I think it would be better to improve the positions instead of trying to interpret the lc0 stats on the
10(?) years old original simtest postions.
There should be bazillions of late opening and middlegame positions with at least 2-5 equally good moves.
"Equally good moves" according to whom? Several months ago I did the following: From 2moves_v1.epd I selected the positions where SF10 considered them at some 1-2s/position to be within [-0.05, 0.05] eval. Then I analyzed the selection with a good Lc0 net for a similar time/position. It resulted in a new Gaussian of evals (for SF it was a narrow rectangle around 0.00), with a standard deviation of about 40cp IIRC (0.40). Almost as bad as the untrimmed 2moves-v1.epd Gaussian itself.

Ferdy had shown that you and Uri are surely partly right, 20-40% (at most) of positions being unfit to qualify as having multiple "equally good moves", but this is a controllable noise. It shows up mostly in the form that usually we have some 35-45% matches for unrelated top engines instead of probably more desirable say 20%. But the sensitivity of the Sim tester is affected only mildly. I do think that the issue is only slightly modifying the outcomes, but the meaning of outcomes and conclusions should stay pretty well.

If someone comes up with a new Sim having some 10,000 better positions, I would be glad to check my speculations I wrote above.
I want to add that it is still possible that you get similar results somehow regarding LC0 nets.
(it is just a bit disappointing that the positions in the test are not that quiet as expected nowadays and have so much single or two best moves)

OTH someone at the LC0 discord chat said it could be also that the NNs, which are built by a kind of generic process, might always
diverge to similar results, if enough training steps are done and that at 100ms it practically plays almost the move the policy head
favours.

I think it's still not that bad, as Ferdy had shown, that 60% of positions have at least 2 close in value moves by SF_dev of today standards (and another 20% debatable). The rest can be treated as baseline consisting usually of very easy to solve unique solutions solved by most engines. The sensitivity of Sim isn't harmed seriously. But if one comes up with a better set of positions, I would be glad to re-do the experiment.

Yes, as I don't know much how the NN are built, whether there are random drifts in weights, policy and value heads landscape, whether the optima landscape is a simple one or not, I do not know if it is normal that all the runs converge to similar move choices. Do the people on discord agree that all the runs should converge to very similar nets as move choices go? A late t35 10b net converged too to the same move choices.

100ms/position was only the first run, afterwards I used 300ms/position or 2000-2500 nodes/position, and similarity only _increased_ by 1-2%. 2000-2500 nodes per position are enough to check for both policy and value heads. It's a higher number of nodes than during the training games. And will probably check the similarity of t30 and t40 late nets at even longer 1000ms/position, or some 15-20,000 nodes per position.

Laskos · Post by **Laskos** » Fri May 03, 2019 8:59 pm

chrisw wrote: ↑Fri May 03, 2019 7:37 pm
Laskos wrote: ↑Wed May 01, 2019 6:38 pm I run a bit modified by Adam Hair (hash cleared before each position) Sim03 tester by Don Dailey. The 8,300 positions tested are quiet positions from real games, having several close in value options for the best move. Engines are on one thread (aside Lc0 on 2 threads), time per positions is 100ms. Here are the cross-percentages for matched moves engine-wise (scroll down the box):
Code: Select all
sim

  Key:

  1) Andscacs 0.95 (time: 100 ms  scale: 1.0)
  2) Ethereal 11.25 (time: 100 ms  scale: 1.0)
  3) Fire 7.1 (time: 100 ms  scale: 1.0)
  4) Fruit 2.1 (time: 100 ms  scale: 1.0)
  5) Komodo 12.3 (time: 100 ms  scale: 1.0)
  6) Lc0 11261 (time: 100 ms  scale: 1.0)
  7) Lc0 32930 (time: 100 ms  scale: 1.0)
  8) Lc0 42184 (time: 100 ms  scale: 1.0)
  9) Senpai 1.0 (time: 100 ms  scale: 1.0)
 10) SF 10 (time: 100 ms  scale: 1.0)
 11) SF 8 (time: 100 ms  scale: 1.0)
 12) SF dev (time: 100 ms  scale: 1.0)

         1     2     3     4     5     6     7     8     9    10    11    12
  1.  ----- 49.19 45.69 37.95 48.17 44.90 43.65 44.22 46.88 50.36 52.22 49.93
  2.  49.19 ----- 48.05 39.58 48.57 47.14 45.29 45.76 48.66 52.15 52.48 52.09
  3.  45.69 48.05 ----- 40.17 46.43 43.41 42.23 43.12 45.35 48.36 50.24 47.69
  4.  37.95 39.58 40.17 ----- 39.51 36.34 35.54 35.72 46.55 37.81 39.88 37.50
  5.  48.17 48.57 46.43 39.51 ----- 45.82 44.54 45.21 48.28 50.10 51.18 50.15
  6.  44.90 47.14 43.41 36.34 45.82 ----- 71.28 71.04 42.90 49.92 47.74 49.78
  7.  43.65 45.29 42.23 35.54 44.54 71.28 ----- 74.81 42.11 48.82 46.88 48.09
  8.  44.22 45.76 43.12 35.72 45.21 71.04 74.81 ----- 42.95 49.34 47.44 48.94
  9.  46.88 48.66 45.35 46.55 48.28 42.90 42.11 42.95 ----- 46.42 48.07 46.56
 10.  50.36 52.15 48.36 37.81 50.10 49.92 48.82 49.34 46.42 ----- 58.76 63.17
 11.  52.22 52.48 50.24 39.88 51.18 47.74 46.88 47.44 48.07 58.76 ----- 57.13
 12.  49.93 52.09 47.69 37.50 50.15 49.78 48.09 48.94 46.56 63.17 57.13 -----
And the dendrogram of similarity in moves selection:

Lc0_Dendrogram.jpg

We see from both the matrix and the dendrogram that all 3 different Lc0 runs are so closely related in this Sim test, that they are MUCH closer one to another than SF_dev to SF10. I was expecting a quite different picture. Even inside the same run, there are many "drift areas" in the "optima landscape" for NN values, many local "optima" and many possibilities to reach some run-dependent more general optimum. So, I expected even inside the same run to find many dissimilarities between the NN nets, never mind very different runs. But I do not understand well this stuff.

Also, we know that positionally (on this Sim 8,300 quiet positions too) Lc0 late nets of a particular run are VERY strong. Is there a possibility that the evals of these different NNs converge to some common choices for different runs just because objectively stronger moves are quite unique even in quiet positions? And the sheer strength makes the nets convergent?
Another possibility is that all three runs are almost identic, varying only in irrelevant details.
And another is that all zero or quasi-zero runs give similar move-selection nets, at least positionally. That would mean that Lc0 must be very similar to Alpha0 positionally.

A note: observe how Stockfishes cluster together, some other new engines not far away, and 2 engines by Fabien Letouzey, Fruit 2.1 and Senpai 1.0 cluster separately together (although they are not closely related). Also, all Lc0s are very unrelated to other engines (but extremely related among themselves).
Not sure if I’m answering your question or what I read in your question, but ...

Once LC0 algo has decided on some root move because that root move has the highest visit count, it takes a lot of visits for an alternative move to overtake it. if the alternative move is a tactic that only appears with deep search, let’s say highest move visit count is 10000 and potential move only got 100 visits at the point the MCTS search sniffed something and starts with increasing visits to alternate move, then it has long way to go to overtake.

Contrast AB, it sniffs something and the ensuing failhigh at root is relatively quick.
So, for LC0, apart from taking quite a while for a tactical deep shot to show up, it’s not really an algorithm designed for solving test positions. It does good positional play instead.

There are probably few positions in the Sim kit having some complicated tactics, the majority seems to consist of positions having "moody" solutions, which often depend on the particular engine's preferences, engine by engine (at least that's how Don selected the positions some 9 years ago). There is a part of positions which have clear unique solutions by SF_dev of today standards, but probably not many of them involving very complicated tactics.

Yes, the nets' options might converge due to the same Lc0 engine used, due to the same MCTS search, due to similar even in details runs and some other reasons which are beyond my knowledge and understanding.

Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?

Re: Why all Lc0 runs result in such similarity of quiet moves selection?