IN 100 games if SF reaches 51 in TCEC it should be stopped

Zenmastur · Post by **Zenmastur** » Sat Oct 12, 2019 6:33 pm

Ovyron wrote: ↑Sat Oct 12, 2019 6:14 pm Are you saying that Leela winning Season 15 was a fluke?

NO! I'm saying that Leela NOT winning season 16 wasn't a fluke.

Regards,

Zenmastur

Ovyron · Post by **Ovyron** » Sat Oct 12, 2019 6:43 pm

Zenmastur wrote: ↑Sat Oct 12, 2019 6:33 pmNO! I'm saying that Leela NOT winning season 16 wasn't a fluke.

I think you need to re-parse that, because it's a double negative, so if you remove the "NOT"s you're left with "I'm saying that Leela winning season 16 was a fluke", but Leela isn't winning season 16. Can you reword what you're saying without using any negatives?

Leo · Post by **Leo** » Sat Oct 12, 2019 6:44 pm

Dann Corbit wrote: ↑Sat Oct 12, 2019 2:59 am
Chessqueen wrote: ↑Sat Oct 12, 2019 12:16 am I do not know if it was luck that AllieStein v0.5-dev_7b41f8c-n11 got a better score than LCO but AS did NOT do as good as LCO against Stockfish 19092522, probably next time around Alliestein with an update might be as strong as Stockfish, unless there is something better than RTX 2080 waiting around the corner. Anyway In 100 games if SF reaches 51 it should be stopped, or they will continue it anyway ? www.tcec-chess.com/
From TCEC 16 Rules and Information:
"Superfinal

The Superfinal consists of 100 games at TC 120+10, with 50 different openings, among them once the normal start position, so that each engine plays both black and white of the same opening position. The match will be presented with opening 1 used in games 1 and 2, then opening 2 used in games 3 and 4 etc.
If the match is theoretically won for one side before game 100, the match will still continue until all 100 games have been played."

SF has already won 51 games, and they are playing on, so the rules are being followed.
I like that, because the games produce really interesting data.

I think advertisers have paid for 100 games.

Leo · Post by **Leo** » Sat Oct 12, 2019 6:45 pm

What has SF done to improve so much? I am really surprised.

Zenmastur · Post by **Zenmastur** » Sat Oct 12, 2019 6:49 pm

Ovyron wrote: ↑Sat Oct 12, 2019 6:43 pm
Zenmastur wrote: ↑Sat Oct 12, 2019 6:33 pmNO! I'm saying that Leela NOT winning season 16 wasn't a fluke.
I think you need to re-parse that, because it's a double negative, so if you remove the "NOT"s you're left with "I'm saying that Leela winning season 16 was a fluke", but Leela isn't winning season 16. Can you reword what you're saying without using any negatives?

The two negatives aren't referring to the same subject. i.e "NOT winning" and "WASN"T a fluke" don't cancel out since the subject isn't the same.

Regards,

Zenmastur

Ovyron · Post by **Ovyron** » Sat Oct 12, 2019 7:03 pm

But that leads to a contradiction:

A. Leela beat Stockfish in TCEC 15 because it was better (it wasn't a fluke.)
B. Allie advanced to TCEC 16 super final because it was better than Leela (Leela on 3rd or fourth.)
C. Stockfish beat Allie because it is better (because NNs aren't mature enough, etc.)
D. Stockfish hasn't been improved significantly since TCEC 15.

So how did Stockfish become better than Allie and Leela without improving much? To resolve this contradictions one of these must be true:

a. Stockfish improved and is now better than Allie and Leela
b. Leela is still better than those but by a fluke it ended third before TCEC 16 superfinal.
c. Leela was never better than Stockfish and it won TCEC 15 by a fluke.
d. (something else that you're saying that I don't get)

Zenmastur · Post by **Zenmastur** » Sat Oct 12, 2019 7:34 pm

Ovyron wrote: ↑Sat Oct 12, 2019 7:03 pm But that leads to a contradiction:

A. Leela beat Stockfish in TCEC 15 because it was better (it wasn't a fluke.)
B. Allie advanced to TCEC 16 super final because it was better than Leela (Leela on 3rd or fourth.)
C. Stockfish beat Allie because it is better (because NNs aren't mature enough, etc.)
D. Stockfish hasn't been improved significantly since TCEC 15.

So how did Stockfish become better than Allie and Leela without improving much? To resolve this contradictions one of these must be true:

a. Stockfish improved and is now better than Allie and Leela
b. Leela is still better than those but by a fluke it ended third before TCEC 16 superfinal.
c. Leela was never better than Stockfish and it won TCEC 15 by a fluke.
d. (something else that you're saying that I don't get)

First it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
Second, D is true as far as raw ELO goes. But it's move selection could have (did) change in subtle ways that affects its ability to successfully defend/attack against Leela while remaining basically neutral against other A/B engines.

Kai claims Leela is superior. Superiority is a moving target. Maybe it was/is. I guess time will tell if the target doesn't move too far in the mean time.

One fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.

Regards,

Zenmastur

Ovyron · Post by **Ovyron** » Sat Oct 12, 2019 8:15 pm

Zenmastur wrote: ↑Sat Oct 12, 2019 7:34 pmFirst it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.

In this case whoever wins wins by a fluke, not by being better than the others.

Zenmastur wrote: ↑Sat Oct 12, 2019 7:34 pmOne fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.

Okay, but if they still play at the level of A/B engines it means they blunder less generally, to compensate for the gross blunders. Any time Stockfish lost in the TCEC 15 final it was because it blundered, and it blundered more often than Leela, so I'd say it'd be more fruitful to reduce the figure of Stockfish's blunders than the ones from NNs, even if they're not as gross.

Because a blunder is a blunder, it'll lose you the game even if it's not a gross one, so I don't see the difference between Stockfish blunders and NN blunders (?? is bad enough to lose.)

Zenmastur · Post by **Zenmastur** » Sat Oct 12, 2019 8:39 pm

Ovyron wrote: ↑Sat Oct 12, 2019 8:15 pm
Zenmastur wrote: ↑Sat Oct 12, 2019 7:34 pmFirst it's possible that three programs can be equal and yet A beats B, B beats C, and C beats A.
In this case whoever wins wins by a fluke, not by being better than the others.

If that's what you want to call it. With a set ELO difference between two opponents you can statistically predict how often each should win in a given match length. I don't call that a fluke. It's just the way it is.

Zenmastur wrote: ↑Sat Oct 12, 2019 7:34 pmOne fact remains, ALL NN engines are capable of producing gross blunders and most do so at somewhat regular intervals (say 1 per 500 moves). This figure needs to be greatly reduced.
Okay, but if they still play at the level of A/B engines it means they blunder less generally, to compensate for the gross blunders. Any time Stockfish lost in the TCEC 15 final it was because it blundered, and it blundered more often than Leela, so I'd say it'd be more fruitful to reduce the figure of Stockfish's blunders than the ones from NNs, even if they're not as gross.

Because a blunder is a blunder, it'll lose you the game even if it's not a gross one, so I don't see the difference between Stockfish blunders and NN blunders (?? is bad enough to lose.)

A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur

Laskos · Post by **Laskos** » Sat Oct 12, 2019 9:04 pm

Zenmastur wrote: ↑Sat Oct 12, 2019 8:39 pm

A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur

IMO you defined the behavior of the two paradigms well. So, you do agree that in most tactically quiet, fairly balanced positions Leela is better (possibly much better)? Doesn't this lead to "take Leela as the base engine, and SF as tactical backup" for analysis? We disagreed on that IIRC.

I am not a Corr Chess player, and I might be wrong.

IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped