IN 100 games if SF reaches 51 in TCEC it should be stopped

Zenmastur · Post by **Zenmastur** » Sat Oct 12, 2019 10:01 pm

Laskos wrote: ↑Sat Oct 12, 2019 9:04 pm
Zenmastur wrote: ↑Sat Oct 12, 2019 8:39 pm

A/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Regards,

Zenmastur
IMO you defined the behavior of the two paradigms well. So, you do agree that in most tactically quiet, fairly balanced positions Leela is better (possibly much better)?

Probably true, with a few caveats.

Doesn't this lead to "take Leela as the base engine, and SF as tactical backup" for analysis? We disagreed on that IIRC.

I am not a Corr Chess player, and I might be wrong.

I wish it were that simple!

I haven't tried analysis using both yet. It's on my to do list, but I want a new computer that's set up for this before I try it in earnest. From a practical point of view, my sense is that using Leela in the opening and the early middle game while checking it's lines of play with SF would be prudent. At least until a sizable advantage is obtained. Once this sub-goal is attained, SF can drive the advantage to fruition while checking with Leela to uncover any missed strategic lines of play. There are certain end games where SF has no clue, I think Leela can help with these.

A problem I see is when Leela thinks sac'ing material for a strategic advantage, sometimes SF will “see” (be able to prove) the wisdom of such moves. Other times, not so much. With a little practice I think a human can discern which course is correct. Trying to do this in a program is fraught with danger and great programming complexity. A special NN just for this might be useful if the idea is to meld the two types of engines into a monolithic and cohesive whole. I'm sure something like this is a way forward. How soon we see an effective implementation is another story.

It's actually a lot more complicated than this but I don't want to write a dissertation on the subject because my ideas are still in flux about many aspects of how this might/should work.

Regards,

Zenmastur

Leo · Post by **Leo** » Sat Oct 12, 2019 10:48 pm

And then theres the people that say you need thousands of games to determine a superior engine. I agree that there is a big difference between LCOs blunders and SFs blunders. I thought that SF would never win another super final. I am glad it is back on top.

Ovyron · Post by **Ovyron** » Sun Oct 13, 2019 12:37 am

Zenmastur wrote: ↑Sat Oct 12, 2019 8:39 pmWith a set ELO difference between two opponents you can statistically predict how often each should win in a given match length.

You should inform TCEC about that so they don't need to run the Super Final at all, they can just use the set ELO difference to predict how often Stockfish or Allie would win and name the champion who would win most often.

We could also get rid of the world chess championship of humans and just award it to whoever has the highest rating, as ELO predicts they'd win more often than not.

Or ELO predictions aren't good enough and actual games should be played, it may turn out that it is ELO which should be adjusted, because it's a shadow that chases the chess entity's real strength, not something in the future that can predict the outcome of matches yet to happen. That's the other way around. ELO needs to see more games to be more accurate, instead of more games being needed to approach the ELO.

Zenmastur wrote: ↑Sat Oct 12, 2019 8:39 pmA/B engines tend more towards micro-blunders when they don't "understand" the position. And they generally produce many micro blunders to lose the game. Death by a thousand small cuts. NN engines can produce gross blunders at ANY time, EVEN when they fully "understand" the position and/or have a commanding lead. When an A/B engine has a commanding lead it has an attainable goal and will rarely blunder it. When an A/B engine "thinks" the position is about even OR there is no clear goal for it, they tend to micro-blunder much more often. The nature, magnitude and number of the blunders tend to differ between the two types of engines.

Have you actually checked the games of TCEC 15 final where Leela beat Stockfish? I'd not call them micro-blunders. Actually, I'll claim what is a "gross blunder" isn't well defined. For starters you have to come with a definition that doesn't depend on A/B engines, because in those games that Stockfish lost against Leela, the ?? moves it played would have been considered best and the alternative gross blunders, but instead they were losing moves and Stockfish didn't have a clue about it.

Leela does not outplay Stockfish slowly making it play micro-blunders until the errors accumulate and it is beaten to submission. Leela brings Stockfish to tactically quiet positions that it doesn't understand and then Stockfish plays the losing move, the blunder (gross or not.) But another NN could reach the same position and just play normally and draw the game, so before the blunder happened no micro-blunders happened, the moves were fine, but A/B engines just don't know how to continue after that. Some moves are only good if your opponent is an A/B engine.

M ANSARI · Post by **M ANSARI** » Sun Oct 13, 2019 9:19 am

If you look at the games that LC0 was beating SF impressively … they all seemed to take advantage of a few holes in SF evaluation. Obviously SF could see that and with enough data from lost games SF was able to plug up the weaknesses. This probably exaggerated the appearance of Lc0 being a much stronger engine as many of the losses were repeat losses to the same evaluation weaknesses. Also let's not forget that Lc0 had glaring tactical weaknesses and probably some hardware problems … yet still managed to win convincingly. The SF of today has gained a tremendous amount of ELO strength simply by having Lc0 point out its weaknesses and it is a much improved engine as a result. The gains in ELO strength would probably never have happened or at least would have taken much longer to happen if it wasn't for Lc0. I think NN engines still have room to grow and this story is not over by any means. Most likely Allie was tuned against Lc0 but not against SF, as it does seem to be losing very badly to SF.

dannyb · Post by **dannyb** » Mon Oct 14, 2019 6:54 pm

M ANSARI wrote: ↑Sun Oct 13, 2019 9:19 am If you look at the games that LC0 was beating SF impressively … they all seemed to take advantage of a few holes in SF evaluation. Obviously SF could see that and with enough data from lost games SF was able to plug up the weaknesses. This probably exaggerated the appearance of Lc0 being a much stronger engine as many of the losses were repeat losses to the same evaluation weaknesses. Also let's not forget that Lc0 had glaring tactical weaknesses and probably some hardware problems … yet still managed to win convincingly. The SF of today has gained a tremendous amount of ELO strength simply by having Lc0 point out its weaknesses and it is a much improved engine as a result. The gains in ELO strength would probably never have happened or at least would have taken much longer to happen if it wasn't for Lc0. I think NN engines still have room to grow and this story is not over by any means. Most likely Allie was tuned against Lc0 but not against SF, as it does seem to be losing very badly to SF.

Maybe the Stockfish team should start tuning against Leela as well.

IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped

Re: IN 100 games if SF reaches 51 in TCEC it should be stopped