87% of the poll voted that SF will win Tcec Sufi 19.

Laskos · Post by **Laskos** » Wed Oct 14, 2020 10:14 am

What is a bit disappointing with Leela and generally with these types of deep NN solvers is their inability to deal with completely new to them problems, not encountered in their training. Sometimes Leela is alike a Prolog or Oracle expert system with a large database. I exemplified in this thread http://talkchess.com/forum3/viewtopic.php?f=2&t=75124 that sound quiet opening positions are dominated by policy in Leela (more than 80%) with little eval contribution (less than 20%) even with many dozens of thousands of searched nodes. Meantime with Chess960 and 2moves_v1 random openings, the contribution of the eval rises to more than 50% and Leela becomes weaker, because its search is weak. So, the main strength of Leela in good openings is its policy tables. Not so nice as the strength of inference goes.

Alayan · Post by **Alayan** » Wed Oct 14, 2020 5:14 pm

Leela starting eval as white in the opening pair it won : 0.50
Leela starting eval as white in the opening pairs Stockfish won : 0.44, 0.41, 0.46, 0.47, 0.38, 0.34, 0.49, 0.36.

You read it right, Leela gave a higher exit eval to the position where it won a minimatch than to the 8 where it lost the minimatch.

Leela has shown ZERO superiority from balanced openings, but of course those are extremely drawish because losing one of those require way too much mistakes for those engines.

Leela has shown clear inferiority in more biased openings.

Arguing that only the start position is a good measure of engine strength is arguing than a weak solution (knowing what to play after having played good moves) is better than a strong solution (knowing what to play from whatever position). If Leela can't get wins from balanced positions (at this level, you need to be massively stronger for it to happen) and is unable to defend/attack imbalanced positions that Stockfish can, then it is a worse engine, period. Good moves from any position is more remarkable than good moves from a narrow subset of positions.

Dann Corbit · Post by **Dann Corbit** » Wed Oct 14, 2020 5:49 pm

I think that is one value of contest like TCEC. To expose flaws in engines.
Early TCEC contests had engines crashing left and right because the SMP was finally stressed in a mighty way.
Bleeding edge hardware is interesting more than for the high end chess result. It also strains programs to their limits.

So we see exposed a limitation of the current NN technique. I guess someone will figure out how to fix it, now that they understand the problem.

Laskos · Post by **Laskos** » Thu Oct 15, 2020 11:29 am

Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.

Raphexon · Post by **Raphexon** » Thu Oct 15, 2020 1:08 pm

Laskos wrote: ↑Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.

Soon to be 10:1.

Dann Corbit · Post by **Dann Corbit** » Thu Oct 15, 2020 3:07 pm

Here is a price of 720:
https://www.videocardbenchmark.net/gpu. ... ER&id=4123

But after thinking about it carefully, Milos is right. I was thinking about things like the CCRL and CEGT standings, But those contests are hopelessly behind right now because they have only tested SF 12 and judging by Pohl.s charts, which shows a full doubling of strength, the cpu version probably is more bang for your buck.

Dann Corbit · Post by **Dann Corbit** » Thu Oct 15, 2020 3:12 pm

Raphexon wrote: ↑Thu Oct 15, 2020 1:08 pm
Laskos wrote: ↑Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.

Of course the error bar is huge, but that would be 60 Elo higher.
That is quite a bit higher than the 12 Elo difference they have listed at tcec

Guenther · Post by **Guenther** » Thu Oct 15, 2020 3:38 pm

Dann Corbit wrote: ↑Thu Oct 15, 2020 3:12 pm
Raphexon wrote: ↑Thu Oct 15, 2020 1:08 pm
Laskos wrote: ↑Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.
Of course the error bar is huge, but that would be 60 Elo higher.
That is quite a bit higher than the 12 Elo difference they have listed at tcec

10:1 game pairs, which is currently W18:L9:D67 totally, surely is not 60 elo diff, except you marginalize draw count
and calculate only uneven pairs, or whatever one likes.

Just calculated it with Ordo and 50 simuls (start rating given: 3500)

Code: Select all

   # PLAYER                                    :   RATING  ERROR  POINTS  PLAYED    (%)
   1 Stockfish 202009282242_nn-baeb9ef2d183    :  3518.37  17.62    51.5      94  54.79
   2 LCZero v0.26.3-rc1_T60.SV.JH.92-190       :  3481.63  17.62    42.5      94  45.21

White advantage = 104.70 +/- 17.88
Draw rate (equal opponents) = 99.93 % +/- 0.22

Head to head statistics:

1) Stockfish 202009282242_nn-baeb9ef2d183 3518.37 :     94 (+18,=67,-9),  54.8 %

   vs.                                            :  games (  +,  =, -),   (%) :     Diff,     SD, CFS (%)
   LCZero v0.26.3-rc1_T60.SV.JH.92-190            :     94 ( 18, 67, 9),  54.8 :   +36.74,  17.98,   98.0

2) LCZero v0.26.3-rc1_T60.SV.JH.92-190    3481.63 :     94 (+9,=67,-18),  45.2 %

   vs.                                            :  games ( +,  =,  -),   (%) :     Diff,     SD, CFS (%)
   Stockfish 202009282242_nn-baeb9ef2d183         :     94 ( 9, 67, 18),  45.2 :   -36.74,  17.98,    2.0

Laskos · Post by **Laskos** » Thu Oct 15, 2020 3:57 pm

Guenther wrote: ↑Thu Oct 15, 2020 3:38 pm
Dann Corbit wrote: ↑Thu Oct 15, 2020 3:12 pm
Raphexon wrote: ↑Thu Oct 15, 2020 1:08 pm
Laskos wrote: ↑Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.
Of course the error bar is huge, but that would be 60 Elo higher.
That is quite a bit higher than the 12 Elo difference they have listed at tcec
10:1 game pairs, which is currently W18:L9:D67 totally, surely is not 60 elo diff, except you marginalize draw count
and calculate only uneven pairs, or whatever one likes.

Just calculated it with Ordo and 50 simuls (start rating given: 3500)
Code: Select all
   # PLAYER                                    :   RATING  ERROR  POINTS  PLAYED    (%)
   1 Stockfish 202009282242_nn-baeb9ef2d183    :  3518.37  17.62    51.5      94  54.79
   2 LCZero v0.26.3-rc1_T60.SV.JH.92-190       :  3481.63  17.62    42.5      94  45.21

White advantage = 104.70 +/- 17.88
Draw rate (equal opponents) = 99.93 % +/- 0.22

Head to head statistics:

1) Stockfish 202009282242_nn-baeb9ef2d183 3518.37 :     94 (+18,=67,-9),  54.8 %

   vs.                                            :  games (  +,  =, -),   (%) :     Diff,     SD, CFS (%)
   LCZero v0.26.3-rc1_T60.SV.JH.92-190            :     94 ( 18, 67, 9),  54.8 :   +36.74,  17.98,   98.0

2) LCZero v0.26.3-rc1_T60.SV.JH.92-190    3481.63 :     94 (+9,=67,-18),  45.2 %

   vs.                                            :  games ( +,  =,  -),   (%) :     Diff,     SD, CFS (%)
   Stockfish 202009282242_nn-baeb9ef2d183         :     94 ( 9, 67, 18),  45.2 :   -36.74,  17.98,    2.0

Hmm..my Ordo has calibration issues too, and I set that 76% Elo difference (-z parameter) in Ordo correctly.
4.79% should be less than 35 Elo points according to the correct logistic or Elo table, not close to 37 Elo points.

Dann Corbit · Post by **Dann Corbit** » Thu Oct 15, 2020 4:28 pm

I just used this thing with 10W : 1L : 36D
https://www.3dkingdoms.com/chess/elo.htm

87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Re: 87% of the poll voted that SF will win Tcec Sufi 19.