I am attaching a very useful 2-mover unbalanced EPD opening file for cases of draw rates above 70% from balanced openings, an 2-mover opening suite which will decrease the draw rate to some 50% even from 95% draw rate before. The Elo differences will be large even to LTC and strong hardware. Real error margins are calculated using pentanomial variance, and are often very much smaller than those shown in UI or Ordo (up to 2 times smaller than those shown by trinomial rating calculators). The suite contains ~1750 different 2-mover positions.
Progress of Stockfish in 6 days
Moderator: Ras
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Progress of Stockfish in 6 days
-
mwyoung
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Progress of Stockfish in 6 days
Yes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
Laskos
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Progress of Stockfish in 6 days
Another +13 Elo points in 2 days. After 8 days of gradual SF development, +48 Elo points gain.
Code: Select all
Games Completed = 1000 of 1000 (Avg game length = 23.583 sec)
Settings = Gauntlet/128MB/6000ms+100ms/M 700cp for 3 moves, D 120 moves/EPD:C:\LittleBlitzer\2moves_80_100.epd(1749)
Time = 6044 sec elapsed, 0 sec remaining
1. SF NNUE 14 Aug 569.0/1000 312-174-514 (L: m=0 t=0 i=0 a=174) (D: r=317 i=103 f=34 s=7 a=53) (tpm=180.8 d=17.68 nps=1327473)
2. SF NNUE 06 Aug 431.0/1000 174-312-514 (L: m=2 t=0 i=0 a=310) (D: r=317 i=103 f=34 s=7 a=53) (tpm=182.7 d=17.34 nps=1215253)
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Progress of Stockfish in 6 days
I ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.mwyoung wrote: ↑Fri Aug 14, 2020 3:48 pmYes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
Komodo rules!
-
mwyoung
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Progress of Stockfish in 6 days
I do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.Laskos wrote: ↑Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
mwyoung
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Progress of Stockfish in 6 days
That why we need to test with more then one thread, and longer time controls. And yes, I have been running these test and will continue to test NNUE. Unlike some I do not average 23 seconds per game and test at 1 thread, and call it good. And say +100 Elo. The truth is more down to earth....lkaufman wrote: ↑Fri Aug 14, 2020 4:32 pmI ran two tests so far on this. First I ran SFNNUE Aug 13 on one thread vs. final pre-NNUE SF (July 31) on four threads at 30" +.5" (at 5 GHz), and got a 56 to 54 win for NNUE. Then I quadrupled the time limit to 2' + 2" and got 53 to 51 for NNUE. So there doesn't appear to be a falling-off with more time, and at least based on one thread the 4x CPU claim appears to be true. Now it must be repeated with more threads, like 32 vs 8 or even 64 vs 16, to see if it is really generally valid. Perhaps you'd like to run this, if not I can.mwyoung wrote: ↑Fri Aug 14, 2020 3:48 pmYes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
DESKTOP-CORSAIR, Rapid 30.0min+30.0sec 0
1 SF+NNUE PO 290720 x64 popc +17 +7/=95/-2 52.40% 54.5/104
2 Stockfish 170720 64 POPCNT -17 +2/=95/-7 47.60% 49.5/104
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
mwyoung
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Progress of Stockfish in 6 days
You should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?lkaufman wrote: ↑Fri Aug 14, 2020 4:32 pmPerhaps you'd like to run this, if not I can.mwyoung wrote: ↑Fri Aug 14, 2020 3:48 pmYes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
lkaufman
- Posts: 6297
- Joined: Sun Jan 10, 2010 6:15 am
- Location: Maryland USA
- Full name: Larry Kaufman
Re: Progress of Stockfish in 6 days
No, I have never done that. I do test with hyperthreading off, which probably reduces the problem, but I suppose it is still an issue. I got an 89 to 81 score for NNUE vs final SF with 8 threads vs 2 at 30" + .5", better than my four vs one thread results, but I'll leave it to you to follow up with more threads if you wish in view of the throttling issue. Maybe it will turn out that quadruple CPU power is overstated, anyway it wasn't my statement.mwyoung wrote: ↑Fri Aug 14, 2020 6:25 pmYou should run your own testing. But just a reminder, if you are running 1 thread vs 2,3,4... threads. Or some kind of combination of x4 cores. You need to go into bios and lock all the cores to the same speed. Or the one core setup will be running much faster. Have you done this in the testing so far?lkaufman wrote: ↑Fri Aug 14, 2020 4:32 pmPerhaps you'd like to run this, if not I can.mwyoung wrote: ↑Fri Aug 14, 2020 3:48 pmYes. And anyone that does not like the results. Needs to complain about CCRL and other testing sites. As this is a pretty standard setup for testing chess engines.
The issue for some here is the rapid falloff of Stockfish+NNUE rating as time controls get longer. And this only gets worse for Stockfish+NNUE with more time and threads. As they want to hype Stockfish+NNUE to the point of ridiculousness. Claiming 4x CPU power.....
This is why I always run all testing with the speed locked on all cores. Because throttling is a bitch when testing chess engines!
Komodo rules!
-
mwyoung
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: Progress of Stockfish in 6 days
Update 1....
DESKTOP-CORSAIR, Rapid 15.0min+15.0sec 0
1 Stockfish 140820+NNUE +0/=21/-0 50.00% 10.5/21 110.25
2 Stockfish 140820 +0/=21/-0 50.00% 10.5/21 110.25
Live Stream:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
Milos
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Progress of Stockfish in 6 days
The only thing I noticed with your "tests" is that you get higher draw rate than a typical correspondence chess match of today.mwyoung wrote: ↑Fri Aug 14, 2020 4:44 pmI do silly crazy things like. Test chess engines much longer then 1 miniute per game. And test chess engines using more then 1 thread. Then I get even crazier by posting my settings and live streaming the engine test.Laskos wrote: ↑Fri Aug 14, 2020 10:36 am -mwyoung- is known to not change his mind at any cost even in the front of overwhelming evidence that he might do much better. Leave him alone, I never follow his tests, he used to use some 36 threads on 16 core machine, I don't know what other silly things he is doing.
That really makes them super uninteresting for anything.