Policy determining quiet early opening preferences of Leela

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Policy determining quiet early opening preferences of Leela

Post by Laskos »

I modified the Sim (similarity) tool, letting it to test similarity using 3-mover GM quiet opening positions. Also, I combined depth=1 tests ("policy value" in the case of Leela) with eval tests at time per move = 500ms (many thousands of nodes for Leela on my RTX 2070 GPU). The similarity matrix is here:

Code: Select all

Key:

  1) Lc0_LS15 d=1 (time: 100 ms  scale: 5.0)
  2) Lc0_LS15 t=500ms (time: 100 ms  scale: 5.0)
  3) SF_11 d=1 (time: 100 ms  scale: 5.0)
  4) SF_11 t=500ms (time: 100 ms  scale: 5.0)
  5) SF_12 d=1 (time: 100 ms  scale: 5.0)
  6) SF_12 t=500ms (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 80.35 30.80 52.55 48.90 60.50
  2.  80.35 ----- 29.30 52.75 44.30 63.90
  3.  30.80 29.30 ----- 33.35 30.30 28.05
  4.  52.55 52.75 33.35 ----- 41.60 52.75
  5.  48.90 44.30 30.30 41.60 ----- 44.45
  6.  60.50 63.90 28.05 52.75 44.45 -----

Image

The 80% match between Leela policy (depth=1) and Leela=500ms is outstanding, and even more impressive compared to classical engine self-similarity in openings and NNUE engine self-similarity (underlined in blue). I checked self-similarity to longer 5s/move time control, and the self-similarity barely decreased from 80% to 77%. This is huge domination of policy over even a long evaluation in very visited during training, early good openings. I plotted the clustering diagram to show the Leela policy "attractor" to which all engines approach.

Image


================================
================================

Second point I wanted to check was whether this domination is present only in openings used during the training (like the sound early opening positions I used here). I used Chess960 1-mover openings with the Sim tool to check for openings Leela never trained on, and the results were completely different:

Code: Select all

  Key:

  1) Lc0_LS15_500ms (time: 100 ms  scale: 5.0)
  2) Lc0_LS15_d1 (time: 100 ms  scale: 5.0)
  3) SF11_500ms (time: 100 ms  scale: 5.0)
  4) SF12_500ms (time: 100 ms  scale: 5.0)
  5) SF_11_d1 (time: 100 ms  scale: 5.0)
  6) SF_12_d1 (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 46.95 45.30 43.05 20.90 16.50
  2.  46.95 ----- 41.30 36.40 24.95 17.65
  3.  45.30 41.30 ----- 39.35 23.45 14.95
  4.  43.05 36.40 39.35 ----- 23.25 21.55
  5.  20.90 24.95 23.45 23.25 ----- 23.20
  6.  16.50 17.65 14.95 21.55 23.20 -----


The more in line with the other two engines and with the common sense, Leela's match rate of 47% of policy versus eval at 500ms for Chess960 shows that for positions not trained on, the behavior is drastically different. Chess960 and regular Chess after several moves are basically the same game, so this difference in Leela policy behavior depending on training set is quite remarkable. Also, Leela depth=1 underperforms in Chess960 compared to regular chess by more than 100 Elo points against regular engines, so this policy tree building in the openings based on the training set is akin at least in strength to an opening book.
Tony P.
Posts: 216
Joined: Sun Jan 22, 2017 8:30 pm
Location: Russia

Re: Policy determining quiet early opening preferences of Leela

Post by Tony P. »

Wow, I didn't expect SF12 to be so much closer than SF11 to Leela on the first test.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Policy determining quiet early opening preferences of Leela

Post by Laskos »

Tony P. wrote: Sat Sep 19, 2020 9:10 am Wow, I didn't expect SF12 to be so much closer than SF11 to Leela on the first test.
Yes, 64% similarity Leela -- SF12 in the openings. They are almost clones! :lol:
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Policy determining quiet early opening preferences of Leela

Post by Milos »

Laskos wrote: Wed Sep 16, 2020 7:05 pm I modified the Sim (similarity) tool, letting it to test similarity using 3-mover GM quiet opening positions. Also, I combined depth=1 tests ("policy value" in the case of Leela) with eval tests at time per move = 500ms (many thousands of nodes for Leela on my RTX 2070 GPU). The similarity matrix is here:

Code: Select all

Key:

  1) Lc0_LS15 d=1 (time: 100 ms  scale: 5.0)
  2) Lc0_LS15 t=500ms (time: 100 ms  scale: 5.0)
  3) SF_11 d=1 (time: 100 ms  scale: 5.0)
  4) SF_11 t=500ms (time: 100 ms  scale: 5.0)
  5) SF_12 d=1 (time: 100 ms  scale: 5.0)
  6) SF_12 t=500ms (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 80.35 30.80 52.55 48.90 60.50
  2.  80.35 ----- 29.30 52.75 44.30 63.90
  3.  30.80 29.30 ----- 33.35 30.30 28.05
  4.  52.55 52.75 33.35 ----- 41.60 52.75
  5.  48.90 44.30 30.30 41.60 ----- 44.45
  6.  60.50 63.90 28.05 52.75 44.45 -----

Image

The 80% match between Leela policy (depth=1) and Leela=500ms is outstanding, and even more impressive compared to classical engine self-similarity in openings and NNUE engine self-similarity (underlined in blue). I checked self-similarity to longer 5s/move time control, and the self-similarity barely decreased from 80% to 77%. This is huge domination of policy over even a long evaluation in very visited during training, early good openings. I plotted the clustering diagram to show the Leela policy "attractor" to which all engines approach.

Image


================================
================================

Second point I wanted to check was whether this domination is present only in openings used during the training (like the sound early opening positions I used here). I used Chess960 1-mover openings with the Sim tool to check for openings Leela never trained on, and the results were completely different:

Code: Select all

  Key:

  1) Lc0_LS15_500ms (time: 100 ms  scale: 5.0)
  2) Lc0_LS15_d1 (time: 100 ms  scale: 5.0)
  3) SF11_500ms (time: 100 ms  scale: 5.0)
  4) SF12_500ms (time: 100 ms  scale: 5.0)
  5) SF_11_d1 (time: 100 ms  scale: 5.0)
  6) SF_12_d1 (time: 100 ms  scale: 5.0)

         1     2     3     4     5     6
  1.  ----- 46.95 45.30 43.05 20.90 16.50
  2.  46.95 ----- 41.30 36.40 24.95 17.65
  3.  45.30 41.30 ----- 39.35 23.45 14.95
  4.  43.05 36.40 39.35 ----- 23.25 21.55
  5.  20.90 24.95 23.45 23.25 ----- 23.20
  6.  16.50 17.65 14.95 21.55 23.20 -----


The more in line with the other two engines and with the common sense, Leela's match rate of 47% of policy versus eval at 500ms for Chess960 shows that for positions not trained on, the behavior is drastically different. Chess960 and regular Chess after several moves are basically the same game, so this difference in Leela policy behavior depending on training set is quite remarkable. Also, Leela depth=1 underperforms in Chess960 compared to regular chess by more than 100 Elo points against regular engines, so this policy tree building in the openings based on the training set is akin at least in strength to an opening book.
Ans then fanboys and trolls come and claim that Leela network doesn't have memory and is nothing like a book. :lol:
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Policy determining quiet early opening preferences of Leela

Post by jp »

Laskos wrote: Wed Sep 16, 2020 7:05 pm Leela's match rate of 47% of policy versus eval at 500ms for Chess960 shows that for positions not trained on, the behavior is drastically different. Chess960 and regular Chess after several moves are basically the same game, so this difference in Leela policy behavior depending on training set is quite remarkable. Also, Leela depth=1 underperforms in Chess960 compared to regular chess by more than 100 Elo points against regular engines, so this policy tree building in the openings based on the training set is akin at least in strength to an opening book.
Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Policy determining quiet early opening preferences of Leela

Post by Albert Silver »

jp wrote: Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Policy determining quiet early opening preferences of Leela

Post by jp »

Albert Silver wrote: Thu Sep 24, 2020 7:41 pm Yes, by move 30 it is usually dead lost.
A meaningless comment that ignores the results of Kai's and others' tests.

Clearly Leela is weaker in relative terms in the middlegame than in the opening, and weaker in the endgame than in the middlegame.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Policy determining quiet early opening preferences of Leela

Post by Laskos »

Albert Silver wrote: Thu Sep 24, 2020 7:41 pm
jp wrote: Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.

Code: Select all

Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Policy determining quiet early opening preferences of Leela

Post by Albert Silver »

Laskos wrote: Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: Thu Sep 24, 2020 7:41 pm
jp wrote: Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.

Code: Select all

Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Policy determining quiet early opening preferences of Leela

Post by Laskos »

Albert Silver wrote: Sat Oct 03, 2020 9:11 pm
Laskos wrote: Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: Thu Sep 24, 2020 7:41 pm
jp wrote: Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.

Code: Select all

Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).