Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by mwyoung »

Match Update 3

Stockfish 051220 x2 Power vs Stockfish 051220

+0 =14 -0, TP = 0 Elo

Live Stream:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by mwyoung »

Update 4.

Code: Select all

DESKTOP-CORSAIR, Slow 90.0min+30.0sec  0

                                     
1   Stockfish 051220           +0/=31/-0 50.00%   15.5/31  240.25
2   Stockfish 051220 x2 Power  +0/=31/-0 50.00%   15.5/31  240.25

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Nordlandia
Posts: 2831
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by Nordlandia »

Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by lkaufman »

Nordlandia wrote: Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?
Komodo rules!
User avatar
Nordlandia
Posts: 2831
Joined: Fri Sep 25, 2015 9:38 pm
Location: Sortland, Norway

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by Nordlandia »

You're right. Considering that it's 960.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by mwyoung »

Update 5

Code: Select all

DESKTOP-CORSAIR, Slow 90.0min+30.0sec  0

                                     
1   Stockfish 051220           +0/=40/-0 50.00%   20.0/40  400.00
2   Stockfish 051220 x2 Power  +0/=40/-0 50.00%   20.0/40  400.00

"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by mwyoung »

lkaufman wrote: Fri Dec 11, 2020 11:22 pm
Nordlandia wrote: Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?
This is a long test. And it will take time.
Right now we can see this is a stockfish nnue issue. It is not clear this would happen to Dragon.

So is this a NNUE issue, or a Stockfish NNUE issue. It will be interesting to see if Dragon will scale at all. Or are we completely bottlenecked and why.

I am unsure how NNUE and Dragon count plies. But if they are the same or close. It is clear SF NNUE prunes much more then Dragon. So if Dragon does scale. This could mean SF NNUE could be helped with a wider search at longer time controls.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Uri Blass
Posts: 11164
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by Uri Blass »

mwyoung wrote: Sat Dec 12, 2020 3:16 am
lkaufman wrote: Fri Dec 11, 2020 11:22 pm
Nordlandia wrote: Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?
This is a long test. And it will take time.
Right now we can see this is a stockfish nnue issue. It is not clear this would happen to Dragon.

So is this a NNUE issue, or a Stockfish NNUE issue. It will be interesting to see if Dragon will scale at all. Or are we completely bottlenecked and why.

I am unsure how NNUE and Dragon count plies. But if they are the same or close. It is clear SF NNUE prunes much more then Dragon. So if Dragon does scale. This could mean SF NNUE could be helped with a wider search at longer time controls.
or maybe it is a general engine issue.
engines at some high level cannot win in FRC by getting twice faster.

Of course weak engines may need slower time control to get to the right level.
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by lkaufman »

mwyoung wrote: Sat Dec 12, 2020 3:16 am
lkaufman wrote: Fri Dec 11, 2020 11:22 pm
Nordlandia wrote: Fri Dec 11, 2020 11:43 am Testing looks like wasted resources. What is the point if one game can be won if 50 or 100 games need to be played.
I think that it is a very useful test, if it shows that SFNNUE is strong enough to draw every time (or nearly every time) against a double time version of itself even in chess960, where at least some of the positions should be fairly tough to draw. It means that there is almost no room for further elo improvement unless we test with signficantly unbalanced openings. Play is still far from perfect, but apparently close enough to draw given a balanced start position, even though one side has some advantage due to moving first. This is quite a significant contribution to our knowledge of where chess engines stand now. It also implies that correspondence chess is totally unplayable now unless bad openings are specified. What has been the experience of correspondence players at high level in the last couple months? Presumably all draws against competent opponents?
This is a long test. And it will take time.
Right now we can see this is a stockfish nnue issue. It is not clear this would happen to Dragon.

So is this a NNUE issue, or a Stockfish NNUE issue. It will be interesting to see if Dragon will scale at all. Or are we completely bottlenecked and why.

I am unsure how NNUE and Dragon count plies. But if they are the same or close. It is clear SF NNUE prunes much more then Dragon. So if Dragon does scale. This could mean SF NNUE could be helped with a wider search at longer time controls.
There is no significant difference in how SF, SFNNUE, Komodo, or (Komodo) Dragon count plies. SF(NNUE) gets more depth than Dragon apparently due to more reducing, which is the only reason it still comes out ahead vs. Dragon. We can equal SF depth easily (at least on single thread testing) by reducing more, but we lose elo if we do so. If we can figure out why we can't reduce profitably as much as SF, we'll pass SF I think. If your results are due to a SFNNUE issue, then Dragon should pass SF at long time controls. But as Uri suggests it may just be that the error rate is now too small to lose games at long time controls on many threads if the initial position is symmetrical with only the side to move advantage for White. If Uri still plays or follows correspondence chess, he can tell us whether the draw percentage at top level has risen noticeably since SF 12 came out.
Komodo rules!
Uri Blass
Posts: 11164
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Live Chess 960 Match Stockfish 051220 x2 Power vs Stockfish 051220 (TC=90m+30s)(32 Threads)

Post by Uri Blass »

I do not still play correspondence chess but tournament in correspondence chess that are finished are usually at least 2 years old

Last world championship that is finished is 20.6.2017-2.10.2019
127 draws out of 136 games

The champion won only 2 games out of 16.

Usually we clearly get more than 90% in finished top tournaments and it is already more than what we had in the past.

there are results of 3 candidate tournaments that started in 20.9.2017
WCCC37CT01 is still not finished(6 wins out of 104 games and 1 unfinished game)
WCCC37CT02 20.9.2017-12.2.2020(11 wins out of 105 games)
WCCC37CT03 20.9.2017-29.10.2019(7 wins out of 105 games)