87% of the poll voted that SF will win Tcec Sufi 19.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Laskos »

What is a bit disappointing with Leela and generally with these types of deep NN solvers is their inability to deal with completely new to them problems, not encountered in their training. Sometimes Leela is alike a Prolog or Oracle expert system with a large database. I exemplified in this thread http://talkchess.com/forum3/viewtopic.php?f=2&t=75124 that sound quiet opening positions are dominated by policy in Leela (more than 80%) with little eval contribution (less than 20%) even with many dozens of thousands of searched nodes. Meantime with Chess960 and 2moves_v1 random openings, the contribution of the eval rises to more than 50% and Leela becomes weaker, because its search is weak. So, the main strength of Leela in good openings is its policy tables. Not so nice as the strength of inference goes.
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Alayan »

Leela starting eval as white in the opening pair it won : 0.50
Leela starting eval as white in the opening pairs Stockfish won : 0.44, 0.41, 0.46, 0.47, 0.38, 0.34, 0.49, 0.36.

You read it right, Leela gave a higher exit eval to the position where it won a minimatch than to the 8 where it lost the minimatch.

Leela has shown ZERO superiority from balanced openings, but of course those are extremely drawish because losing one of those require way too much mistakes for those engines.

Leela has shown clear inferiority in more biased openings.

Arguing that only the start position is a good measure of engine strength is arguing than a weak solution (knowing what to play after having played good moves) is better than a strong solution (knowing what to play from whatever position). If Leela can't get wins from balanced positions (at this level, you need to be massively stronger for it to happen) and is unable to defend/attack imbalanced positions that Stockfish can, then it is a worse engine, period. Good moves from any position is more remarkable than good moves from a narrow subset of positions.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Dann Corbit »

I think that is one value of contest like TCEC. To expose flaws in engines.
Early TCEC contests had engines crashing left and right because the SMP was finally stressed in a mighty way.
Bleeding edge hardware is interesting more than for the high end chess result. It also strains programs to their limits.

So we see exposed a limitation of the current NN technique. I guess someone will figure out how to fix it, now that they understand the problem.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Laskos »

Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Raphexon »

Laskos wrote: Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Dann Corbit »

Here is a price of 720:
https://www.videocardbenchmark.net/gpu. ... ER&id=4123

But after thinking about it carefully, Milos is right. I was thinking about things like the CCRL and CEGT standings, But those contests are hopelessly behind right now because they have only tested SF 12 and judging by Pohl.s charts, which shows a full doubling of strength, the cpu version probably is more bang for your buck.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Dann Corbit »

Raphexon wrote: Thu Oct 15, 2020 1:08 pm
Laskos wrote: Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.
Of course the error bar is huge, but that would be 60 Elo higher.
That is quite a bit higher than the 12 Elo difference they have listed at tcec
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Guenther
Posts: 4605
Joined: Wed Oct 01, 2008 6:33 am
Location: Regensburg, Germany
Full name: Guenther Simon

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Guenther »

Dann Corbit wrote: Thu Oct 15, 2020 3:12 pm
Raphexon wrote: Thu Oct 15, 2020 1:08 pm
Laskos wrote: Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.
Of course the error bar is huge, but that would be 60 Elo higher.
That is quite a bit higher than the 12 Elo difference they have listed at tcec
10:1 game pairs, which is currently W18:L9:D67 totally, surely is not 60 elo diff, except you marginalize draw count
and calculate only uneven pairs, or whatever one likes.

Just calculated it with Ordo and 50 simuls (start rating given: 3500)

Code: Select all

   # PLAYER                                    :   RATING  ERROR  POINTS  PLAYED    (%)
   1 Stockfish 202009282242_nn-baeb9ef2d183    :  3518.37  17.62    51.5      94  54.79
   2 LCZero v0.26.3-rc1_T60.SV.JH.92-190       :  3481.63  17.62    42.5      94  45.21

White advantage = 104.70 +/- 17.88
Draw rate (equal opponents) = 99.93 % +/- 0.22

Head to head statistics:

1) Stockfish 202009282242_nn-baeb9ef2d183 3518.37 :     94 (+18,=67,-9),  54.8 %

   vs.                                            :  games (  +,  =, -),   (%) :     Diff,     SD, CFS (%)
   LCZero v0.26.3-rc1_T60.SV.JH.92-190            :     94 ( 18, 67, 9),  54.8 :   +36.74,  17.98,   98.0

2) LCZero v0.26.3-rc1_T60.SV.JH.92-190    3481.63 :     94 (+9,=67,-18),  45.2 %

   vs.                                            :  games ( +,  =,  -),   (%) :     Diff,     SD, CFS (%)
   Stockfish 202009282242_nn-baeb9ef2d183         :     94 ( 9, 67, 18),  45.2 :   -36.74,  17.98,    2.0

https://rwbc-chess.de

trollwatch:
Chessqueen + chessica + AlexChess + Eduard + Sylwy
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Laskos »

Guenther wrote: Thu Oct 15, 2020 3:38 pm
Dann Corbit wrote: Thu Oct 15, 2020 3:12 pm
Raphexon wrote: Thu Oct 15, 2020 1:08 pm
Laskos wrote: Thu Oct 15, 2020 11:29 am Glanced at the standing:

Sf destroys Leela 9:1 =36 pairs. Poor Leela.
Soon to be 10:1.
Of course the error bar is huge, but that would be 60 Elo higher.
That is quite a bit higher than the 12 Elo difference they have listed at tcec
10:1 game pairs, which is currently W18:L9:D67 totally, surely is not 60 elo diff, except you marginalize draw count
and calculate only uneven pairs, or whatever one likes.

Just calculated it with Ordo and 50 simuls (start rating given: 3500)

Code: Select all

   # PLAYER                                    :   RATING  ERROR  POINTS  PLAYED    (%)
   1 Stockfish 202009282242_nn-baeb9ef2d183    :  3518.37  17.62    51.5      94  54.79
   2 LCZero v0.26.3-rc1_T60.SV.JH.92-190       :  3481.63  17.62    42.5      94  45.21

White advantage = 104.70 +/- 17.88
Draw rate (equal opponents) = 99.93 % +/- 0.22

Head to head statistics:

1) Stockfish 202009282242_nn-baeb9ef2d183 3518.37 :     94 (+18,=67,-9),  54.8 %

   vs.                                            :  games (  +,  =, -),   (%) :     Diff,     SD, CFS (%)
   LCZero v0.26.3-rc1_T60.SV.JH.92-190            :     94 ( 18, 67, 9),  54.8 :   +36.74,  17.98,   98.0

2) LCZero v0.26.3-rc1_T60.SV.JH.92-190    3481.63 :     94 (+9,=67,-18),  45.2 %

   vs.                                            :  games ( +,  =,  -),   (%) :     Diff,     SD, CFS (%)
   Stockfish 202009282242_nn-baeb9ef2d183         :     94 ( 9, 67, 18),  45.2 :   -36.74,  17.98,    2.0

Hmm..my Ordo has calibration issues too, and I set that 76% Elo difference (-z parameter) in Ordo correctly.
4.79% should be less than 35 Elo points according to the correct logistic or Elo table, not close to 37 Elo points.
Dann Corbit
Posts: 12537
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: 87% of the poll voted that SF will win Tcec Sufi 19.

Post by Dann Corbit »

I just used this thing with 10W : 1L : 36D
https://www.3dkingdoms.com/chess/elo.htm
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.