Policy determining quiet early opening preferences of Leela

Albert Silver · Post by **Albert Silver** » Sat Oct 03, 2020 9:56 pm

Laskos wrote: ↑Sat Oct 03, 2020 9:21 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:11 pm
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).

You see, the thing is I completely disagree with your conclusions. Were these technically winning endgames? Is Fruit so bad it is known to regularly flub drawn endgames? Do other engines do so much better in this exact test? Take SF12. How did it do in this test using the same positions and same conditions?

Laskos · Post by **Laskos** » Sat Oct 03, 2020 10:04 pm

Albert Silver wrote: ↑Sat Oct 03, 2020 9:56 pm
Laskos wrote: ↑Sat Oct 03, 2020 9:21 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:11 pm
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).
You see, the thing is I completely disagree with your conclusions. Were these technically winning endgames? Is Fruit so bad it is known to regularly flub drawn endgames? Do other engines do so much better in this exact test? Take SF12. How did it do in this test using the same positions and same conditions?

Nobody probably knows in many of these endgames whether they are won or drawn. You have the PGN file. I can play from these endgames SF12 against Fruit 2.1, but keep in mind that endgames contribute to less than 15% of the Elo of an engine. Thew bottom line is: Lc0 with a good net and strong GPU is as weak in endgames as Fruit 2.1 on one core, a tremendous underperformance you agree or not.

Laskos · Post by **Laskos** » Sat Oct 03, 2020 10:53 pm

Albert Silver wrote: ↑Sat Oct 03, 2020 9:56 pm
Laskos wrote: ↑Sat Oct 03, 2020 9:21 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:11 pm
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).
You see, the thing is I completely disagree with your conclusions. Were these technically winning endgames? Is Fruit so bad it is known to regularly flub drawn endgames? Do other engines do so much better in this exact test? Take SF12. How did it do in this test using the same positions and same conditions?

Here is the SF12 performance against Fruit 2.1 in the same conditions:

Code: Select all

Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match

SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.

Albert Silver · Post by **Albert Silver** » Sat Oct 03, 2020 11:08 pm

Laskos wrote: ↑Sat Oct 03, 2020 10:53 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:56 pm
Laskos wrote: ↑Sat Oct 03, 2020 9:21 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:11 pm
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).
You see, the thing is I completely disagree with your conclusions. Were these technically winning endgames? Is Fruit so bad it is known to regularly flub drawn endgames? Do other engines do so much better in this exact test? Take SF12. How did it do in this test using the same positions and same conditions?
Here is the SF12 performance against Fruit 2.1 in the same conditions:
Code: Select all
Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match
SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.

I assume this was without tablebases. In any case, what is clear is clear: SF12 in late endgames is barely 100 Elo stronger than Fruit 2.1, consistent with 1000 Elo points underperformance compared to opening or overall performance.

Laskos · Post by **Laskos** » Sat Oct 03, 2020 11:22 pm

Albert Silver wrote: ↑Sat Oct 03, 2020 11:08 pm
Laskos wrote: ↑Sat Oct 03, 2020 10:53 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:56 pm
Laskos wrote: ↑Sat Oct 03, 2020 9:21 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 9:11 pm
Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm
Albert Silver wrote: ↑Thu Sep 24, 2020 7:41 pm
jp wrote: ↑Thu Sep 24, 2020 11:11 am Interesting, though not surprising.

This is consistent with what we've seen before, i.e. Leela gets progressively worse the longer the game goes on.
Yes, by move 30 it is usually dead lost.
In fact you are more correct than I thought. In endgames Lc0 LS15 (one of the best nets out there) on RTX 2070 is similar in strength to...Fruit 2.1 on one core, underperforming by at least 1000 Elo points compared to openings, where it is the strongest engine on my PC (and I have no upper limit of its strength in the openings). So, you are basically right, if by move 30-40 Lc0 is not winning, it will hardly win later.
Code: Select all
Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match
PGN:
https://gofile.io/d/U8aHbN
Uh-huh. So I assume that in this exact same test, SF performs 1000 Elo better?
No, but just because endgames are not that important part of the computer game of Chess Elo-wise. I performed in the past some experiments showing that Elo-wise in engines' matches endgames contribute with less than 15% in that total engines' Elo. But the result that Lc0 in endgames is of similar strength to Fruit 2.1 stands, and it is an abysmal performance compared to openings, where on my PC it is above SF12 on 4 cores (I don't know by how much, but quite sensibly).
You see, the thing is I completely disagree with your conclusions. Were these technically winning endgames? Is Fruit so bad it is known to regularly flub drawn endgames? Do other engines do so much better in this exact test? Take SF12. How did it do in this test using the same positions and same conditions?
Here is the SF12 performance against Fruit 2.1 in the same conditions:
Code: Select all
Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match
SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.
I assume this was without tablebases. In any case, what is clear is clear: SF12 in late endgames is barely 100 Elo stronger than Fruit 2.1, consistent with 1000 Elo points underperformance compared to opening or overall performance.

No tablebases.
No, you didn't follow or are intentionally distorting the meaning of what I wrote. To get 100 Elo points solely in late endgames, SF12 is about 1000 Elo points stronger overall than Fruit 2.1, from openings to endgames. Lc0 gets 0 Elo points in endgames against Fruit 2.1, but is about the same 1000 Elo points better overall from openings to endgames, and even stronger in openings. Therefore 1000+ Elo points underperformance of Lc0 in endgames comparing to openings. What is not clear in "in late endgames, Lc0 on strong GPU is about equal to Fruit 2.1 on one core" ? The result is obvious. And in openings Lc0 is stronger than SF12 (I can show this too).

And you were basically right: if by move 30-40 Lc0 is not winning, it will hardly win later.

jp · Post by jp » Sun Oct 04, 2020 10:57 am

Laskos wrote: ↑Sat Oct 03, 2020 11:22 pm
Albert Silver wrote: ↑Sat Oct 03, 2020 11:08 pm
Laskos wrote: ↑Sat Oct 03, 2020 10:53 pm Here is the SF12 performance against Fruit 2.1 in the same conditions:
Code: Select all
Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match
SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.
I assume this was without tablebases. In any case, what is clear is clear: SF12 in late endgames is barely 100 Elo stronger than Fruit 2.1, consistent with 1000 Elo points underperformance compared to opening or overall performance.
No tablebases.
No, you didn't follow or are intentionally distorting the meaning of what I wrote. To get 100 Elo points solely in late endgames, SF12 is about 1000 Elo points stronger overall than Fruit 2.1, from openings to endgames. Lc0 gets 0 Elo points in endgames against Fruit 2.1, but is about the same 1000 Elo points better overall from openings to endgames, and even stronger in openings. Therefore 1000+ Elo points underperformance of Lc0 in endgames comparing to openings. What is not clear in "in late endgames, Lc0 on strong GPU is about equal to Fruit 2.1 on one core" ? The result is obvious. And in openings Lc0 is stronger than SF12 (I can show this too).

And you were basically right: if by move 30-40 Lc0 is not winning, it will hardly win later.

Yes, it looks like he was desperately hoping that the endgame positions in the test were so drawish that SF too would have equal performance to Fruit, but that's not what happened.

From your results, it's obvious that Leela RTX2070 is massively inferior in endgames to SF 1 core (despite the Leela team introducing non-zero elements to the training to try to boost Leela's performance, where by "zero" I mean not using any outside information not produced by Leela itself).

Guenther · Post by **Guenther** » Sun Oct 04, 2020 12:19 pm

Laskos wrote: ↑Sat Oct 03, 2020 10:53 pm
...

Here is the SF12 performance against Fruit 2.1 in the same conditions:
Code: Select all
Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match
SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.

I don't see the games download for this test though only for the previous one?

Laskos · Post by **Laskos** » Sun Oct 04, 2020 12:50 pm

Guenther wrote: ↑Sun Oct 04, 2020 12:19 pm
Laskos wrote: ↑Sat Oct 03, 2020 10:53 pm
...

Here is the SF12 performance against Fruit 2.1 in the same conditions:
Code: Select all
Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match
SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.
I don't see the games download for this test though only for the previous one?

I don't seem to find now the PGN in the default folder. I will replay and post the PGN.

Laskos · Post by **Laskos** » Sun Oct 04, 2020 1:09 pm

Laskos wrote: ↑Sun Oct 04, 2020 12:50 pm
Guenther wrote: ↑Sun Oct 04, 2020 12:19 pm
Laskos wrote: ↑Sat Oct 03, 2020 10:53 pm
...

Here is the SF12 performance against Fruit 2.1 in the same conditions:
Code: Select all
Score of SF_12 vs Fruit_21: 38 - 9 - 53  [0.645] 100
...      SF_12 playing White: 16 - 8 - 26  [0.580] 50
...      SF_12 playing Black: 22 - 1 - 27  [0.710] 50
...      White vs Black: 17 - 30 - 53  [0.435] 100
Elo difference: 103.7 +/- 46.5, LOS: 100.0 %, DrawRatio: 53.0 %
Finished match
SF12 was playing on 1 core, not even 4. The result is completely different from that of Lc0 against Fruit 2.1 and if one knows that endgames' contribution is 10%-15%, the result is again consistent with 1000 Elo points underperformance of Lc0 comparing to opening performance or even overall performance. In fact what is clear is clear: Lc0 in late endgames is the level of Fruit 2.1.
I don't see the games download for this test though only for the previous one?
I don't seem to find now the PGN in the default folder. I will replay and post the PGN.

Replay:

Code: Select all

Score of SF_12 vs Fruit_21: 47 - 15 - 38  [0.660] 100
...      SF_12 playing White: 23 - 8 - 19  [0.650] 50
...      SF_12 playing Black: 24 - 7 - 19  [0.670] 50
...      White vs Black: 30 - 32 - 38  [0.490] 100
Elo difference: 115.2 +/- 55.1, LOS: 100.0 %, DrawRatio: 38.0 %
Finished match

PGN
https://gofile.io/d/f4WjZl

The openings are the same as in the Lc0 match against Fruit, possibly in scrambled order.

MMarco · Post by **MMarco** » Mon Oct 05, 2020 2:06 am

Laskos wrote: ↑Sat Oct 03, 2020 8:00 pm

Code: Select all

Endgames:

Score of lc0_LS15 vs Fruit_21: 17 - 17 - 66  [0.500] 100
...      lc0_LS15 playing White: 8 - 10 - 32  [0.480] 50
...      lc0_LS15 playing Black: 9 - 7 - 34  [0.520] 50
...      White vs Black: 15 - 19 - 66  [0.480] 100
Elo difference: 0.0 +/- 39.8, LOS: 50.0 %, DrawRatio: 66.0 %
Finished match

Interesting. I guess that was at a fast time control. I ran Lc0 tcec-19 (on a rtx 2060) vs Stockfish tcec-19 (1 core Ryzen 9 4900H) at 100s + 1s with 5-men syzygy, on your test set. With these conditions, Lc0 and Stockfish are usually about on par (see my tournaments here: http://talkchess.com/forum3/viewtopic.p ... 56#p860072 ). My test conditions are such that engines calculate about 1000-1500 times fewer nodes per move than at TCEC.

Code: Select all

   # PLAYER               :  RATING  ERROR  PLAYED    (%)   CFS    W    D    L   D(%)
   1 Lc0 tcec-19          :     0.0   23.1     100  50.00    50   32   36   32  36.00
   2 Stockfish tcec-19    :     0.0   23.1     100  50.00   ---   32   36   32  36.00

White advantage = -21.05 +/- 29.75
Draw rate (equal opponents) = 36.10 % +/- 4.51

Games: https://gofile.io/d/xz9EBb

I would guess that the bad result against Fruit is due to Leela missing tactics at low depth. Given a reasonable time control, Leela is on par with Stockfish in this endgame test.

Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela

Re: Policy determining quiet early opening preferences of Leela