IN 100 games if SF reaches 51 in TCEC it should be stopped

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Zenmastur
Posts: 516
Joined: Sat May 31, 2014 6:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur » Sat Oct 12, 2019 4:33 pm

Ovyron wrote:
Sat Oct 12, 2019 4:14 pm
Are you saying that Leela winning Season 15 was a fluke?
NO! I'm saying that Leela NOT winning season 16 wasn't a fluke.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

User avatar
Ovyron
Posts: 2829
Joined: Tue Jul 03, 2007 2:30 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Ovyron » Sat Oct 12, 2019 4:43 pm

Zenmastur wrote:
Sat Oct 12, 2019 4:33 pm
NO! I'm saying that Leela NOT winning season 16 wasn't a fluke.
I think you need to re-parse that, because it's a double negative, so if you remove the "NOT"s you're left with "I'm saying that Leela winning season 16 was a fluke", but Leela isn't winning season 16. Can you reword what you're saying without using any negatives?
Great spirits have always encountered violent opposition from mediocre minds.

Leo
Posts: 844
Joined: Fri Sep 16, 2016 4:55 pm
Location: USA/Minnesota
Full name: Leo

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Leo » Sat Oct 12, 2019 4:44 pm

Dann Corbit wrote:
Sat Oct 12, 2019 12:59 am
Chessqueen wrote:
Fri Oct 11, 2019 10:16 pm
I do not know if it was luck that AllieStein v0.5-dev_7b41f8c-n11 got a better score than LCO but AS did NOT do as good as LCO against Stockfish 19092522, probably next time around Alliestein with an update might be as strong as Stockfish, unless there is something better than RTX 2080 waiting around the corner. Anyway In 100 games if SF reaches 51 it should be stopped, or they will continue it anyway ? www.tcec-chess.com/
From TCEC 16 Rules and Information:
"Superfinal

The Superfinal consists of 100 games at TC 120+10, with 50 different openings, among them once the normal start position, so that each engine plays both black and white of the same opening position. The match will be presented with opening 1 used in games 1 and 2, then opening 2 used in games 3 and 4 etc.
If the match is theoretically won for one side before game 100, the match will still continue until all 100 games have been played."

SF has already won 51 games, and they are playing on, so the rules are being followed.
I like that, because the games produce really interesting data.
I think advertisers have paid for 100 games.
Advanced Micro Devices fan.

Leo
Posts: 844
Joined: Fri Sep 16, 2016 4:55 pm
Location: USA/Minnesota
Full name: Leo

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Leo » Sat Oct 12, 2019 4:45 pm

What has SF done to improve so much? I am really surprised.
Advanced Micro Devices fan.

Zenmastur
Posts: 516
Joined: Sat May 31, 2014 6:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur » Sat Oct 12, 2019 4:49 pm

Ovyron wrote:
Sat Oct 12, 2019 4:43 pm
Zenmastur wrote:
Sat Oct 12, 2019 4:33 pm
NO! I'm saying that Leela NOT winning season 16 wasn't a fluke.
I think you need to re-parse that, because it's a double negative, so if you remove the "NOT"s you're left with "I'm saying that Leela winning season 16 was a fluke", but Leela isn't winning season 16. Can you reword what you're saying without using any negatives?
The two negatives aren't referring to the same subject. i.e "NOT winning" and "WASN"T a fluke" don't cancel out since the subject isn't the same.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

User avatar
Ovyron
Posts: 2829
Joined: Tue Jul 03, 2007 2:30 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Ovyron » Sat Oct 12, 2019 5:03 pm

But that leads to a contradiction:

A. Leela beat Stockfish in TCEC 15 because it was better (it wasn't a fluke.)
B. Allie advanced to TCEC 16 super final because it was better than Leela (Leela on 3rd or fourth.)
C. Stockfish beat Allie because it is better (because NNs aren't mature enough, etc.)
D. Stockfish hasn't been improved significantly since TCEC 15.

So how did Stockfish become better than Allie and Leela without improving much? To resolve this contradictions one of these must be true:

a. Stockfish improved and is now better than Allie and Leela
b. Leela is still better than those but by a fluke it ended third before TCEC 16 superfinal.
c. Leela was never better than Stockfish and it won TCEC 15 by a fluke.
d. (something else that you're saying that I don't get)
Great spirits have always encountered violent opposition from mediocre minds.

Zenmastur
Posts: 516
Joined: Sat May 31, 2014 6:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur » Sat Oct 12, 2019 5:34 pm

Ovyron wrote:
Sat Oct 12, 2019 5:03 pm
But that leads to a contradiction:

A. Leela beat Stockfish in TCEC 15 because it was better (it wasn't a fluke.)
B. Allie advanced to TCEC 16 super final because it was better than Leela (Leela on 3rd or fourth.)
C. Stockfish beat Allie because it is better (because NNs aren't mature enough, etc.)
D. Stockfish hasn't been improved significantly since TCEC 15.

So how did Stockfish become better than Allie and Leela without improving much? To resolve this contradictions one of these must be true:

a. Stockfish improved and is now better than Allie and Leela
b. Leela is still better than those but by a fluke it ended third before TCEC 16 superfinal.
c. Leela was never better than Stockfish and it won TCEC 15 by a fluke.
d. (something else that you're saying that I don't get)
First it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
Second, D is true as far as raw ELO goes. But it's move selection could have (did) change in subtle ways that affects its ability to successfully defend/attack against Leela while remaining basically neutral against other A/B engines.

Kai claims Leela is superior. Superiority is a moving target. Maybe it was/is. I guess time will tell if the target doesn't move too far in the mean time.

One fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

User avatar
Ovyron
Posts: 2829
Joined: Tue Jul 03, 2007 2:30 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Ovyron » Sat Oct 12, 2019 6:15 pm

Zenmastur wrote:
Sat Oct 12, 2019 5:34 pm
First it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
In this case whoever wins wins by a fluke, not by being better than the others.
Zenmastur wrote:
Sat Oct 12, 2019 5:34 pm
One fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.
Okay, but if they still play at the level of A/B engines it means they blunder less generally, to compensate for the gross blunders. Any time Stockfish lost in the TCEC 15 final it was because it blundered, and it blundered more often than Leela, so I'd say it'd be more fruitful to reduce the figure of Stockfish's blunders than the ones from NNs, even if they're not as gross.

Because a blunder is a blunder, it'll lose you the game even if it's not a gross one, so I don't see the difference between Stockfish blunders and NN blunders (?? is bad enough to lose.)
Great spirits have always encountered violent opposition from mediocre minds.

Zenmastur
Posts: 516
Joined: Sat May 31, 2014 6:28 am

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Zenmastur » Sat Oct 12, 2019 6:39 pm

Ovyron wrote:
Sat Oct 12, 2019 6:15 pm
Zenmastur wrote:
Sat Oct 12, 2019 5:34 pm
First it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
In this case whoever wins wins by a fluke, not by being better than the others.
If that's what you want to call it. With a set ELO difference between two opponents you can statistically predict how often each should win in a given match length. I don't call that a fluke. It's just the way it is.
Zenmastur wrote:
Sat Oct 12, 2019 5:34 pm
One fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.
Okay, but if they still play at the level of A/B engines it means they blunder less generally, to compensate for the gross blunders. Any time Stockfish lost in the TCEC 15 final it was because it blundered, and it blundered more often than Leela, so I'd say it'd be more fruitful to reduce the figure of Stockfish's blunders than the ones from NNs, even if they're not as gross.

Because a blunder is a blunder, it'll lose you the game even if it's not a gross one, so I don't see the difference between Stockfish blunders and NN blunders (?? is bad enough to lose.)
A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.

User avatar
Laskos
Posts: 9517
Joined: Wed Jul 26, 2006 8:21 pm
Full name: Kai Laskos

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Post by Laskos » Sat Oct 12, 2019 7:04 pm

Zenmastur wrote:
Sat Oct 12, 2019 6:39 pm


A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur
IMO you defined the behavior of the two paradigms well. So, you do agree that in most tactically quiet, fairly balanced positions Leela is better (possibly much better)? Doesn't this lead to "take Leela as the base engine, and SF as tactical backup" for analysis? We disagreed on that IIRC.

I am not a Corr Chess player, and I might be wrong.

Post Reply