Policy determining quiet early opening preferences of Leela

Laskos · Post by **Laskos** » Wed Sep 16, 2020 7:05 pm

I modified the Sim (similarity) tool, letting it to test similarity using 3-mover GM quiet opening positions. Also, I combined depth=1 tests ("policy value" in the case of Leela) with eval tests at time per move = 500ms (many thousands of nodes for Leela on my RTX 2070 GPU). The similarity matrix is here:

Code: Select all

Key:

  1) Lc0_LS15 d=1 (time: 100 ms  scale: 5.0)
  2) Lc0_LS15 t=500ms (time: 100 ms  scale: 5.0)
  3) SF_11 d=1 (time: 100 ms  scale: 5.0)
  4) SF_11 t=500ms (time: 100 ms  scale: 5.0)
  5) SF_12 d=1 (time: 100 ms  scale: 5.0)
  6) SF_12 t=500ms (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 80.35 30.80 52.55 48.90 60.50
  2.  80.35 ----- 29.30 52.75 44.30 63.90
  3.  30.80 29.30 ----- 33.35 30.30 28.05
  4.  52.55 52.75 33.35 ----- 41.60 52.75
  5.  48.90 44.30 30.30 41.60 ----- 44.45
  6.  60.50 63.90 28.05 52.75 44.45 -----

The 80% match between Leela policy (depth=1) and Leela=500ms is outstanding, and even more impressive compared to classical engine self-similarity in openings and NNUE engine self-similarity (underlined in blue). I checked self-similarity to longer 5s/move time control, and the self-similarity barely decreased from 80% to 77%. This is huge domination of policy over even a long evaluation in very visited during training, early good openings. I plotted the clustering diagram to show the Leela policy "attractor" to which all engines approach.

================================
================================

Second point I wanted to check was whether this domination is present only in openings used during the training (like the sound early opening positions I used here). I used Chess960 1-mover openings with the Sim tool to check for openings Leela never trained on, and the results were completely different:

Code: Select all

  Key:

  1) Lc0_LS15_500ms (time: 100 ms  scale: 5.0)
  2) Lc0_LS15_d1 (time: 100 ms  scale: 5.0)
  3) SF11_500ms (time: 100 ms  scale: 5.0)
  4) SF12_500ms (time: 100 ms  scale: 5.0)
  5) SF_11_d1 (time: 100 ms  scale: 5.0)
  6) SF_12_d1 (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 46.95 45.30 43.05 20.90 16.50
  2.  46.95 ----- 41.30 36.40 24.95 17.65
  3.  45.30 41.30 ----- 39.35 23.45 14.95
  4.  43.05 36.40 39.35 ----- 23.25 21.55
  5.  20.90 24.95 23.45 23.25 ----- 23.20
  6.  16.50 17.65 14.95 21.55 23.20 -----

The more in line with the other two engines and with the common sense, Leela's match rate of 47% of policy versus eval at 500ms for Chess960 shows that for positions not trained on, the behavior is drastically different. Chess960 and regular Chess after several moves are basically the same game, so this difference in Leela policy behavior depending on training set is quite remarkable. Also, Leela depth=1 underperforms in Chess960 compared to regular chess by more than 100 Elo points against regular engines, so this policy tree building in the openings based on the training set is akin at least in strength to an opening book.

Tony P. · Post by **Tony P.** » Sat Sep 19, 2020 9:10 am

Wow, I didn't expect SF12 to be so much closer than SF11 to Leela on the first test.

Laskos · Post by **Laskos** » Sat Sep 19, 2020 10:49 am

Tony P. wrote: ↑Sat Sep 19, 2020 9:10 am Wow, I didn't expect SF12 to be so much closer than SF11 to Leela on the first test.

Yes, 64% similarity Leela -- SF12 in the openings. They are almost clones!

Milos · Post by **Milos** » Sun Sep 20, 2020 12:39 am

Laskos wrote: ↑Wed Sep 16, 2020 7:05 pm I modified the Sim (similarity) tool, letting it to test similarity using 3-mover GM quiet opening positions. Also, I combined depth=1 tests ("policy value" in the case of Leela) with eval tests at time per move = 500ms (many thousands of nodes for Leela on my RTX 2070 GPU). The similarity matrix is here:
Code: Select all
Key:

  1) Lc0_LS15 d=1 (time: 100 ms  scale: 5.0)
  2) Lc0_LS15 t=500ms (time: 100 ms  scale: 5.0)
  3) SF_11 d=1 (time: 100 ms  scale: 5.0)
  4) SF_11 t=500ms (time: 100 ms  scale: 5.0)
  5) SF_12 d=1 (time: 100 ms  scale: 5.0)
  6) SF_12 t=500ms (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 80.35 30.80 52.55 48.90 60.50
  2.  80.35 ----- 29.30 52.75 44.30 63.90
  3.  30.80 29.30 ----- 33.35 30.30 28.05
  4.  52.55 52.75 33.35 ----- 41.60 52.75
  5.  48.90 44.30 30.30 41.60 ----- 44.45
  6.  60.50 63.90 28.05 52.75 44.45 -----
The 80% match between Leela policy (depth=1) and Leela=500ms is outstanding, and even more impressive compared to classical engine self-similarity in openings and NNUE engine self-similarity (underlined in blue). I checked self-similarity to longer 5s/move time control, and the self-similarity barely decreased from 80% to 77%. This is huge domination of policy over even a long evaluation in very visited during training, early good openings. I plotted the clustering diagram to show the Leela policy "attractor" to which all engines approach.

================================
================================

Second point I wanted to check was whether this domination is present only in openings used during the training (like the sound early opening positions I used here). I used Chess960 1-mover openings with the Sim tool to check for openings Leela never trained on, and the results were completely different:
Code: Select all
  Key:

  1) Lc0_LS15_500ms (time: 100 ms  scale: 5.0)
  2) Lc0_LS15_d1 (time: 100 ms  scale: 5.0)
  3) SF11_500ms (time: 100 ms  scale: 5.0)
  4) SF12_500ms (time: 100 ms  scale: 5.0)
  5) SF_11_d1 (time: 100 ms  scale: 5.0)
  6) SF_12_d1 (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 46.95 45.30 43.05 20.90 16.50
  2.  46.95 ----- 41.30 36.40 24.95 17.65
  3.  45.30 41.30 ----- 39.35 23.45 14.95
  4.  43.05 36.40 39.35 ----- 23.25 21.55
  5.  20.90 24.95 23.45 23.25 ----- 23.20
  6.  16.50 17.65 14.95 21.55 23.20 -----
The more in line with the other two engines and with the common sense, Leela's match rate of 47% of policy versus eval at 500ms for Chess960 shows that for positions not trained on, the behavior is drastically different. Chess960 and regular Chess after several moves are basically the same game, so this difference in Leela policy behavior depending on training set is quite remarkable. Also, Leela depth=1 underperforms in Chess960 compared to regular chess by more than 100 Elo points against regular engines, so this policy tree building in the openings based on the training set is akin at least in strength to an opening book.

Ans then fanboys and trolls come and claim that Leela network doesn't have memory and is nothing like a book.

jp · Post by jp » Thu Sep 24, 2020 11:11 am

Laskos wrote: ↑Wed Sep 16, 2020 7:05 pm Leela's match rate of 47% of policy versus eval at 500ms for Chess960 shows that for positions not trained on, the behavior is drastically different. Chess960 and regular Chess after several moves are basically the same game, so this difference in Leela policy behavior depending on training set is quite remarkable. Also, Leela depth=1 underperforms in Chess960 compared to regular chess by more than 100 Elo points against regular engines, so this policy tree building in the openings based on the training set is akin at least in strength to an opening book.

Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.

Albert Silver · Post by **Albert Silver** » Thu Sep 24, 2020 7:41 pm

jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.

Yes, by move 30 it is usually dead lost.

jp · Post by jp » Sat Sep 26, 2020 10:17 am

Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm Yes, by move 30 it is usually dead lost.

A meaningless comment that ignores the results of Kai's and others' tests.

Clearly Leela is weaker in relative terms in the middlegame than in the opening, and weaker in the endgame than in the middlegame.

Laskos · Post by **Laskos** » Sat Oct 03, 2020 8:00 pm

Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.

In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.

Code: Select all

Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match

PGN:
https://gofile.io/d/U8aHbN

Albert Silver · Post by **Albert Silver** » Sat Oct 03, 2020 9:11 pm

Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN

Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?

Laskos · Post by **Laskos** » Sat Oct 03, 2020 9:21 pm

Albert Silver wrote: ↑Sat Oct 03, 2020 9:11 pm
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?

No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).

Policy determining quiet early opening preferences of Leela

Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela