Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

mwyoung · Post by **mwyoung** » Mon Dec 07, 2020 7:56 pm

Match Update 3

Stockfish 051220 x2 Power vs Stockfish 051220

+0 =14 -0, TP = 0 Elo

Live Stream:

mwyoung · Post by **mwyoung** » Fri Dec 11, 2020 5:51 am

Update 4.

DESKTOP-CORSAIR, Slow 90.0min+30.0sec  0

                                     
1   Stockfish 051220           +0/=31/-0 50.00%   15.5/31  240.25
2   Stockfish 051220 x2 Power  +0/=31/-0 50.00%   15.5/31  240.25

Nordlandia · Post by **Nordlandia** » Fri Dec 11, 2020 11:43 am

Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.

lkaufman · Post by **lkaufman** » Fri Dec 11, 2020 11:22 pm

Nordlandia wrote: ↑Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.

I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?

Nordlandia · Post by **Nordlandia** » Fri Dec 11, 2020 11:38 pm

You're right. Considering that it's 960.

mwyoung · Post by **mwyoung** » Sat Dec 12, 2020 3:01 am

Update 5

Code: Select all

DESKTOP-CORSAIR, Slow 90.0min+30.0sec  0

                                     
1   Stockfish 051220           +0/=40/-0 50.00%   20.0/40  400.00
2   Stockfish 051220 x2 Power  +0/=40/-0 50.00%   20.0/40  400.00

mwyoung · Post by **mwyoung** » Sat Dec 12, 2020 3:16 am

lkaufman wrote: ↑Fri Dec 11, 2020 11:22 pm
Nordlandia wrote: ↑Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?

This is a long test. And it will take time.
Right now we can see this is a stockfish nnue issue. It is not clear this would happen to Dragon.

So is this a NNUE issue, or a Stockfish NNUE issue. It will be interesting to see if Dragon will scale at all. Or are we completely bottlenecked and why.

I am unsure how NNUE and Dragon count plies. But if they are the same or close. It is clear SF NNUE prunes much more then Dragon. So if Dragon does scale. This could mean SF NNUE could be helped with a wider search at longer time controls.

Uri Blass · Post by **Uri Blass** » Sat Dec 12, 2020 3:37 am

mwyoung wrote: ↑Sat Dec 12, 2020 3:16 am
lkaufman wrote: ↑Fri Dec 11, 2020 11:22 pm
Nordlandia wrote: ↑Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?
This is a long test. And it will take time.
Right now we can see this is a stockfish nnue issue. It is not clear this would happen to Dragon.

So is this a NNUE issue, or a Stockfish NNUE issue. It will be interesting to see if Dragon will scale at all. Or are we completely bottlenecked and why.

I am unsure how NNUE and Dragon count plies. But if they are the same or close. It is clear SF NNUE prunes much more then Dragon. So if Dragon does scale. This could mean SF NNUE could be helped with a wider search at longer time controls.

or maybe it is a general engine issue.
engines at some high level cannot win in FRC by getting twice faster.

Of course weak engines may need slower time control to get to the right level.

lkaufman · Post by **lkaufman** » Sat Dec 12, 2020 4:40 am

mwyoung wrote: ↑Sat Dec 12, 2020 3:16 am
lkaufman wrote: ↑Fri Dec 11, 2020 11:22 pm
Nordlandia wrote: ↑Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?
This is a long test. And it will take time.
Right now we can see this is a stockfish nnue issue. It is not clear this would happen to Dragon.

So is this a NNUE issue, or a Stockfish NNUE issue. It will be interesting to see if Dragon will scale at all. Or are we completely bottlenecked and why.

I am unsure how NNUE and Dragon count plies. But if they are the same or close. It is clear SF NNUE prunes much more then Dragon. So if Dragon does scale. This could mean SF NNUE could be helped with a wider search at longer time controls.

There is no significant difference in how SF, SFNNUE, Komodo, or (Komodo) Dragon count plies. SF(NNUE) gets more depth than Dragon apparently due to more reducing, which is the only reason it still comes out ahead vs. Dragon. We can equal SF depth easily (at least on single thread testing) by reducing more, but we lose elo if we do so. If we can figure out why we can't reduce profitably as much as SF, we'll pass SF I think. If your results are due to a SFNNUE issue, then Dragon should pass SF at long time controls. But as Uri suggests it may just be that the error rate is now too small to lose games at long time controls on many threads if the initial position is symmetrical with only the side to move advantage for White. If Uri still plays or follows correspondence chess, he can tell us whether the draw percentage at top level has risen noticeably since SF 12 came out.

Uri Blass · Post by **Uri Blass** » Sat Dec 12, 2020 7:12 am

I do not still play correspondence chess but tournament in correspondence chess that are finished are usually at least 2 years old

Last world championship that is finished is 20.6.2017-2.10.2019
127 draws out of 136 games

The champion won only 2 games out of 16.

Usually we clearly get more than 90% in finished top tournaments and it is already more than what we had in the past.

there are results of 3 candidate tournaments that started in 20.9.2017
WCCC37CT01 is still not finished(6 wins out of 104 games and 1 unfinished game)
WCCC37CT02 20.9.2017-12.2.2020(11 wins out of 105 games)
WCCC37CT03 20.9.2017-29.10.2019(7 wins out of 105 games)

Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)