Progress of Stockfish in 6 days

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Progress of Stockfish in 6 days

Post by Laskos »

marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
I am attaching a very useful 2-mover unbalanced EPD opening file for cases of draw rates above 70% from balanced openings, an 2-mover opening suite which will decrease the draw rate to some 50% even from 95% draw rate before. The Elo differences will be large even to LTC and strong hardware. Real error margins are calculated using pentanomial variance, and are often very much smaller than those shown in UI or Ordo (up to 2 times smaller than those shown by trinomial rating calculators). The suite contains ~1750 different 2-mover positions.
2moves_80_100.rar
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Progress of Stockfish in 6 days

Post by Laskos »

Another +13 Elo points in 2 days. After 8 days of gradual SF development, +48 Elo points gain.

Code: Select all

Games Completed = 1000 of 1000 (Avg game length = 23.583 sec)
Settings = Gauntlet/128MB/6000ms+100ms/M 700cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_80_100.epd(1749)
Time = 6044 sec elapsed, 0 sec remaining
 1.  SF NNUE 14 Aug              	569.0/1000	312-174-514  	(L: m=0 t=0 i=0 a=174)	(D: r=317 i=103 f=34 s=7 a=53)	(tpm=180.8 d=17.68 nps=1327473)
 2.  SF NNUE 06 Aug             	431.0/1000	174-312-514  	(L: m=2 t=0 i=0 a=310)	(D: r=317 i=103 f=34 s=7 a=53)	(tpm=182.7 d=17.34 nps=1215253)
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Progress of Stockfish in 6 days

Post by lkaufman »

mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.
Komodo rules!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

Laskos wrote: Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

lkaufman wrote: Fri Aug 14, 2020 4:32 pm
mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.
That why we need to test with more then one thread, and longer time controls. And yes, I have been running these test and will continue to test NNUE. Unlike some I do not average 23 seconds per game and test at 1 thread, and call it good. And say +100 Elo. The truth is more down to earth....


DESKTOP-CORSAIR, Rapid 30.0min+30.0sec 0

1 SF+NNUE PO 290720 x64 popc +17 +7/=95/-2 52.40% 54.5/104
2 Stockfish 170720 64 POPCNT -17 +2/=95/-7 47.60% 49.5/104
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

lkaufman wrote: Fri Aug 14, 2020 4:32 pm
mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Perhaps you'd like to run this, if not I can.
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?

This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Progress of Stockfish in 6 days

Post by lkaufman »

mwyoung wrote: Fri Aug 14, 2020 6:25 pm
lkaufman wrote: Fri Aug 14, 2020 4:32 pm
mwyoung wrote: Fri Aug 14, 2020 3:48 pm
marsell wrote: Fri Aug 14, 2020 1:28 pm -mwyoung- You're doing the right thing. Draws are part of the game, if anyone doesn't like that they should not play the game. Especially when testing, the draws are essential.
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.

The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Perhaps you'd like to run this, if not I can.
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?

This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
No, I have never done that. I do test with hyperthreading off, which probably reduces the problem, but I suppose it is still an issue. I got an 89 to 81 score for NNUE vs final SF with 8 threads vs 2 at 30" + .5", better than my four vs one thread results, but I'll leave it to you to follow up with more threads if you wish in view of the throttling issue. Maybe it will turn out that quadruple CPU power is overstated, anyway it wasn't my statement.
Komodo rules!
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: Progress of Stockfish in 6 days

Post by mwyoung »

Jouni wrote: Wed Aug 12, 2020 9:36 pm Yes SF NNUE is equal to quadruple your CPU cores for free. Incredible :!: :!: .
Update 1....


DESKTOP-CORSAIR, Rapid 15.0min+15.0sec 0


1 Stockfish 140820+NNUE +0/=21/-0 50.00% 10.5/21 110.25
2 Stockfish 140820 +0/=21/-0 50.00% 10.5/21 110.25

Live Stream:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Progress of Stockfish in 6 days

Post by Milos »

mwyoung wrote: Fri Aug 14, 2020 4:44 pm
Laskos wrote: Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.
The only thing I noticed with your "tests" is that you get higher draw rate than a typical correspondence chess match of today. ;)
That really makes them super uninteresting for anything.