mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
All in all quickly glancing:
Openings, draw rate, contempt in SF11, weaker SF11 --- all combined are not ruled out as the main culprits here. There is no Laskos rule for 41:0 =159 results and even less for 2:1 =197 result. W/L is nuts in all you examples 8 vs 1 core. Sure, a worse multicore scaling of NNUE SF is quite possible here, but I guess one would need a bit clearer matches. Anyway, thanks for this long test.
mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
All in all quickly glancing:
Openings, draw rate, contempt in SF11, weaker SF11 --- all combined are not ruled out as the main culprits here. There is no Laskos rule for 41:0 =159 results and even less for 2:1 =197 result. W/L is nuts in all you examples 8 vs 1 core. Sure, a worse multicore scaling of NNUE SF is quite possible here, but I guess one would need a bit clearer matches. Anyway, thanks for this long test.
I am all for more testing. Because "Houston, we've had a problem"
And I used your words exactly "Lasko's Law" on why you said that CCRL are... ""Yes, underperformance of 8CPU SF12 is statistically significant"-Lasko
and I asked WHY?
And then came Lasko's Law with no Data! Agreeing with the Flawed testing of CCRL. And this was your PROOF!
Lasko's Law----"What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model."
As I told a member of CCRL about this thread....."As always on CCC, too much speculation and not enough data."
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
All in all quickly glancing:
Openings, draw rate, contempt in SF11, weaker SF11 --- all combined are not ruled out as the main culprits here. There is no Laskos rule for 41:0 =159 results and even less for 2:1 =197 result. W/L is nuts in all you examples 8 vs 1 core. Sure, a worse multicore scaling of NNUE SF is quite possible here, but I guess one would need a bit clearer matches. Anyway, thanks for this long test.
I am all for more testing. Because "Houston, we've had a problem"
And I used your words exactly "Lasko's Law" on why you said that CCRL are... ""Yes, underperformance of 8CPU SF12 is statistically significant"-Lasko
and I asked WHY?
And then came Lasko's Law with no Data! Agreeing with the Flawed testing of CCRL. And this was your PROOF!
Lasko's Law----"What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model."
As I told a member of CCRL about this thread....."As always on CCC, too much speculation and not enough data."
Yes, I agree with my statement "Yes, underperformance of 8CPU SF12 is statistically significant". For the rest with doublings, I used that as an estimate for CCRL blitz conditions and usual testing. When they populate the list with 8 cored engines (fairly tested) you will see what I am talking about.
mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
All in all quickly glancing:
Openings, draw rate, contempt in SF11, weaker SF11 --- all combined are not ruled out as the main culprits here. There is no Laskos rule for 41:0 =159 results and even less for 2:1 =197 result. W/L is nuts in all you examples 8 vs 1 core. Sure, a worse multicore scaling of NNUE SF is quite possible here, but I guess one would need a bit clearer matches. Anyway, thanks for this long test.
I am all for more testing. Because "Houston, we've had a problem"
And I used your words exactly "Lasko's Law" on why you said that CCRL are... ""Yes, underperformance of 8CPU SF12 is statistically significant"-Lasko
and I asked WHY?
And then came Lasko's Law with no Data! Agreeing with the Flawed testing of CCRL. And this was your PROOF!
Lasko's Law----"What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model."
As I told a member of CCRL about this thread....."As always on CCC, too much speculation and not enough data."
Yes, I agree with my statement "Yes, underperformance of 8CPU SF12 is statistically significant". For the rest with doublings, I used that as an estimate for CCRL blitz conditions and usual testing. When they populate the list with 8 cored engines (fairly tested) you will see what I am talking about.
We will see, but your PROOF has been busted. And for what every the cause of SF 12 and CCRL testing. I think we can now both agree "CCRL flawed testing : SF12 above SF12 8CPU" is clearly unfair. In the light of the data we have.
I think CCRL takes pride in their work. Agree or disagree with their methods of testing.
Or you would not see this by CCRL.
Modern Times-"It is doing my head in for sure."....."The somewhat unstructured and ad-hoc nature of our testing doesn't help in this situation either, although with enough games that usually eventually resolves itself. To get to the bottom of it you need to do some structured testing with exactly the same opponents, same hardware and testing conditions - which you and others have done or are doing."
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
All in all quickly glancing:
Openings, draw rate, contempt in SF11, weaker SF11 --- all combined are not ruled out as the main culprits here. There is no Laskos rule for 41:0 =159 results and even less for 2:1 =197 result. W/L is nuts in all you examples 8 vs 1 core. Sure, a worse multicore scaling of NNUE SF is quite possible here, but I guess one would need a bit clearer matches. Anyway, thanks for this long test.
I am all for more testing. Because "Houston, we've had a problem"
And I used your words exactly "Lasko's Law" on why you said that CCRL are... ""Yes, underperformance of 8CPU SF12 is statistically significant"-Lasko
and I asked WHY?
And then came Lasko's Law with no Data! Agreeing with the Flawed testing of CCRL. And this was your PROOF!
Lasko's Law----"What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model."
As I told a member of CCRL about this thread....."As always on CCC, too much speculation and not enough data."
Yes, I agree with my statement "Yes, underperformance of 8CPU SF12 is statistically significant". For the rest with doublings, I used that as an estimate for CCRL blitz conditions and usual testing. When they populate the list with 8 cored engines (fairly tested) you will see what I am talking about.
We will see, but your PROOF has been busted. And for what every the cause of SF 12 and CCRL testing. I think we can now both agree "CCRL flawed testing : SF12 above SF12 8CPU" is clearly unfair. In the light of the data we have.
I think CCRL takes pride in their work. Agree or disagree with their methods of testing.
Or you would not see this by CCRL.
Modern Times-"It is doing my head in for sure."....."The somewhat unstructured and ad-hoc nature of our testing doesn't help in this situation either, although with enough games that usually eventually resolves itself. To get to the bottom of it you need to do some structured testing with exactly the same opponents, same hardware and testing conditions - which you and others have done or are doing."
It could be that SF NNUE scales badly to 8 cores, I haven't ruled that out. Maybe you are into something, if the high draw rate is not explained by contempt and absolute strength.
Laskos wrote: ↑Sun Oct 11, 2020 5:56 pm
... W/L is nuts in all you examples 8 vs 1 core.
I find it amazing too that SF NNUE 8 cores goes undefeated in 400 games! +75,=325,-0 (59.375%).
I got the same kind of results here, against an opponent that was almost on par and a higher win rate (66.5%), most likely due to the "low draws" openings:
mwyoung wrote: ↑Sat Oct 10, 2020 7:23 am
Lasko's Law----What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.
It is clear to me that Stockfish NNUE does not obey Lasko's law as stated above. CCRL most likely does not have flawed testing.. And as suspected. The issues is with Stockfish NNUE. It took me many hours to testing to show this result, and the full results will be shown soon. When the testing is completed. The bottom line is the issue is with Stockfish NNUE, and not with CCRL testing. Full results coming soon. As you know testing can take days to answer this kind of anomaly, or false assumption.
All results were tested under the same conditions with a TC = 2m+1s. With the same book, and settings, with Perfect Book 2019. CPU was a 2950x with all cores locked to 4.1 Ghz.
Stockfish 11 with a classical evaluation obeys Lasko's Law. But assuming Stockfish 12 a hybrid with the new NN evaluation will also obey Stockfish's classical pattern was in error. Stockfish 12 does not obey Lasko's Law.
I tested two versions of Stockfish 12, version 12, and version 12 (051020). To make sure this behavior was not with just the original Stockfish 12.
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Result:
--------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 12 dup 8 cores 200 44 156 0 122.0 100.0 77.7
2. Stockfish 12 dup 1 core 200 0 156 44 78.0 0.0 -77.7
Cross table:
--------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 12 dup 8 cores 122.0 200 x 1===1===============1======11======1==1===========1=1=====1===11===1===11==1===1========1=1=1==1=1========1===1=========1==1===========1===11======1====1====1==1====1=====1======1====111==1=11===1===1
2. Stockfish 12 dup 1 core 78.0 200 0===0===============0======00======0==0===========0=0=====0===00===0===00==0===0========0=0=0==0=0========0===0=========0==0===========0===00======0====0====0==0====0=====0======0====000==0=00===0===0 x
Tech:
--------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 12 dup 8 cores 28475K 9905390 35.2 2.9 48.0 137.8
2. Stockfish 12 dup 1 core 3474K 1180298 29.1 2.9 48.1 141.4
all --- 15585K 5486571 32.1 2.9 48.0 139.6
Tournament finished! Elapsed: 15:46:49
Stockfish 11 1 vs 8 cores +147.2 Elo
Stockfish 12 1 vs 8 cores +77.7 Elo
Stockfish 051020 1 vs 8 cores +54.3 Elo
Stockfish 051020 8 vs 16 cores +1.7 Elo
Result:
-------------------------------------------------------------------------------------------
# name games wins draws losses score los% elo+/-
1. Stockfish 051020 dup 16 cores 200 2 197 1 100.5 71.8 1.7
2. Stockfish 051020 dup 8 cores 200 1 197 2 99.5 28.2 -1.7
Cross table:
-------------------------------------------------------------------------------------------
# name score games 1 2
1. Stockfish 051020 dup 16 cores 100.5 200 x ==============================================================0=1============================================================================1==========================================================
2. Stockfish 051020 dup 8 cores 99.5 200 ==============================================================1=0============================================================================0========================================================== x
Tech:
-------------------------------------------------------------------------------------------
Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
# name nodes/m NPS depth/m time/m moves time
1. Stockfish 051020 dup 16 cores 57514K 20152492 41.9 2.9 49.5 141.2
2. Stockfish 051020 dup 8 cores 29868K 10471507 39.0 2.9 49.5 141.2
all --- 42662K 15311939 40.4 2.9 49.5 141.2
All in all quickly glancing:
Openings, draw rate, contempt in SF11, weaker SF11 --- all combined are not ruled out as the main culprits here. There is no Laskos rule for 41:0 =159 results and even less for 2:1 =197 result. W/L is nuts in all you examples 8 vs 1 core. Sure, a worse multicore scaling of NNUE SF is quite possible here, but I guess one would need a bit clearer matches. Anyway, thanks for this long test.
I am all for more testing. Because "Houston, we've had a problem"
And I used your words exactly "Lasko's Law" on why you said that CCRL are... ""Yes, underperformance of 8CPU SF12 is statistically significant"-Lasko
and I asked WHY?
And then came Lasko's Law with no Data! Agreeing with the Flawed testing of CCRL. And this was your PROOF!
Lasko's Law----"What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model."
As I told a member of CCRL about this thread....."As always on CCC, too much speculation and not enough data."
Yes, I agree with my statement "Yes, underperformance of 8CPU SF12 is statistically significant". For the rest with doublings, I used that as an estimate for CCRL blitz conditions and usual testing. When they populate the list with 8 cored engines (fairly tested) you will see what I am talking about.
We will see, but your PROOF has been busted. And for what every the cause of SF 12 and CCRL testing. I think we can now both agree "CCRL flawed testing : SF12 above SF12 8CPU" is clearly unfair. In the light of the data we have.
I think CCRL takes pride in their work. Agree or disagree with their methods of testing.
Or you would not see this by CCRL.
Modern Times-"It is doing my head in for sure."....."The somewhat unstructured and ad-hoc nature of our testing doesn't help in this situation either, although with enough games that usually eventually resolves itself. To get to the bottom of it you need to do some structured testing with exactly the same opponents, same hardware and testing conditions - which you and others have done or are doing."
It could be that SF NNUE scales badly to 8 cores, I haven't ruled that out. Maybe you are into something, if the high draw rate is not explained by contempt and absolute strength.
Too few (fast) games, but the draw rate is already significantly higher with NNUE with this small sample, compressing the Elo difference.
"It could be that SF NNUE scales badly to 8 cores, I haven't ruled that out. Maybe you are into something, if the high draw rate is not explained by contempt and absolute strength."
I agree, and Stockfish 12 and above scales even worst with 8 cores to 16 cores! +1.7 Elo!!!
And that is why I assume nothing, and test what I think to be true.
I have been testing chess engines for 40 years.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
"It could be that SF NNUE scales badly to 8 cores, I haven't ruled that out. Maybe you are into something, if the high draw rate is not explained by contempt and absolute strength."
I agree, and Stockfish 12 and above scales even worst with 8 cores to 16 cores!
And that is why I assume nothing, and test what I think to be true.
I am not sure what I have assumed. I said that the result is statistically anomalous (not a fluke) and that the choice of opponents can break the Elo model. I didn't rule out anything, like bad scaling to multicore of SF12.
"It could be that SF NNUE scales badly to 8 cores, I haven't ruled that out. Maybe you are into something, if the high draw rate is not explained by contempt and absolute strength."
I agree, and Stockfish 12 and above scales even worst with 8 cores to 16 cores!
And that is why I assume nothing, and test what I think to be true.
I am not sure what I have assumed. I said that the result is statistically anomalous (not a fluke) and that the choice of opponents can break the Elo model. I didn't rule out anything, like bad scaling to multicore of SF12.
This is easy to answer. You have a great mind. And you are clearly very smart. But your bias and your
laziness clearly clouds your judgement. And I always read your comments, as it makes me a better critical thinker!
And I did notice that you chopped! My response in this thread!.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.