I checked on 4 cores with my i7 CPU and the result against 1 core in bullet was in excess of 100 Elo points. It's very unlikely that it regresses to 8 cores. In fact I saw results on SF testing framework on many cores (64?) showing good scaling with cores of SF NNUE, at least at short time controls.mwyoung wrote: ↑Tue Oct 06, 2020 8:09 pmThen you are assuming this is true then with STOCKFISH 12. So you have no data! This is why you always fall off the rails.Laskos wrote: ↑Tue Oct 06, 2020 8:02 pmWhat's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
So this could be a issue with Stockfish 12 with 8 cores, and CCRL testing could be correct.![]()
CCRL flawed testing : SF12 above SF12 8CPU
Moderator: Ras
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: CCRL flawed testing : SF12 above SF12 8CPU
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
Now we have some limited data to work with. The data said we have a issue if true. Why? Bad testing, bad hardware configuration, or still is there a issue with SF 12 on 8 cores. Since we have only CCRL data for 8 cores with SF 12.Laskos wrote: ↑Tue Oct 06, 2020 8:21 pmI checked on 4 cores with my i7 CPU and the result against 1 core in bullet was in excess of 100 Elo points. It's very unlikely that it regresses to 8 cores. In fact I saw results on SF testing framework on many cores (64?) showing good scaling with cores of SF NNUE, at least at short time controls.mwyoung wrote: ↑Tue Oct 06, 2020 8:09 pmThen you are assuming this is true then with STOCKFISH 12. So you have no data! This is why you always fall off the rails.Laskos wrote: ↑Tue Oct 06, 2020 8:02 pmWhat's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
So this could be a issue with Stockfish 12 with 8 cores, and CCRL testing could be correct.![]()
We need to rule out a SF 12 issue first. Before looking at other reasons like CCRL.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: CCRL flawed testing : SF12 above SF12 8CPU
mwyoung wrote: ↑Tue Oct 06, 2020 8:35 pmNow we have some limited data to work with. The data said we have a issue if true. Why? Bad testing, bad hardware configuration, or still is there a issue with SF 12 on 8 cores. Since we have only CCRL data for 8 cores with SF 12.Laskos wrote: ↑Tue Oct 06, 2020 8:21 pmI checked on 4 cores with my i7 CPU and the result against 1 core in bullet was in excess of 100 Elo points. It's very unlikely that it regresses to 8 cores. In fact I saw results on SF testing framework on many cores (64?) showing good scaling with cores of SF NNUE, at least at short time controls.mwyoung wrote: ↑Tue Oct 06, 2020 8:09 pmThen you are assuming this is true then with STOCKFISH 12. So you have no data! This is why you always fall off the rails.Laskos wrote: ↑Tue Oct 06, 2020 8:02 pmWhat's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
So this could be a issue with Stockfish 12 with 8 cores, and CCRL testing could be correct.![]()
We need to rule out a SF 12 issue first. Before looking at other reasons like CCRL.
You are free to rule out anything you want.
-
- Posts: 4718
- Joined: Wed Oct 01, 2008 6:33 am
- Location: Regensburg, Germany
- Full name: Guenther Simon
Re: CCRL flawed testing : SF12 above SF12 8CPU
I guess you are both right to an unknown extent ;-)
1. 8CPU testing is in its infanty at CCRL - there are only three entities with 8 cores listed out of over 2800 in blitz at all and zero
in the rapid list - this means a problem with the 8CPU machine or its speed vs. other hardware cannot be ruled out currently
2. The 'non-mix' of opponents for the first SF12 8CPU results sure has an impact
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
Thanks,Laskos wrote: ↑Tue Oct 06, 2020 8:53 pmmwyoung wrote: ↑Tue Oct 06, 2020 8:35 pmNow we have some limited data to work with. The data said we have a issue if true. Why? Bad testing, bad hardware configuration, or still is there a issue with SF 12 on 8 cores. Since we have only CCRL data for 8 cores with SF 12.Laskos wrote: ↑Tue Oct 06, 2020 8:21 pmI checked on 4 cores with my i7 CPU and the result against 1 core in bullet was in excess of 100 Elo points. It's very unlikely that it regresses to 8 cores. In fact I saw results on SF testing framework on many cores (64?) showing good scaling with cores of SF NNUE, at least at short time controls.mwyoung wrote: ↑Tue Oct 06, 2020 8:09 pmThen you are assuming this is true then with STOCKFISH 12. So you have no data! This is why you always fall off the rails.Laskos wrote: ↑Tue Oct 06, 2020 8:02 pmWhat's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
So this could be a issue with Stockfish 12 with 8 cores, and CCRL testing could be correct.![]()
We need to rule out a SF 12 issue first. Before looking at other reasons like CCRL.
You are free to rule out anything you want.
If CCRL has really bad data here. I would like to be fair, and show it with some kind of data.
Stockfish 12 (1 core) vs Stockfish 12 (8 cores) (TC=2m+1s) (200 Rounds)
Live:
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 1205
- Joined: Thu Dec 25, 2008 9:07 pm
- Full name: Herbert L
Re: CCRL flawed testing : SF12 above SF12 8CPU
Interesting GUI, what GUI is that?
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 550
- Joined: Tue Nov 19, 2019 8:48 pm
- Full name: Alayan Feh
Re: CCRL flawed testing : SF12 above SF12 8CPU
What's your final result ?
-
- Posts: 2727
- Joined: Wed May 12, 2010 10:00 pm
Re: CCRL flawed testing : SF12 above SF12 8CPU
It was not 140 Elo, or 120 Elo, and not even 80 Elo. It was 72. So if Lasko's numbers are correct. Then this may not be a CCRL issue in total.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
But my words like silent raindrops fell. And echoed in the wells of silence.
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: CCRL flawed testing : SF12 above SF12 8CPU
Can't it just be a scaling problem with Stockfish? According to CEGT, Stockfish 11 is only 6 ELO stronger @ 8CPU than Stockfish 11 @ 4CPU:
http://www.cegt.net/40_40%20Rating%20Li ... liste.html
This would probably mean that, if you run a long enough test between Stockfish 11 @ 4CPU and 8CPU, the result would be almost, if not equal.
http://www.cegt.net/40_40%20Rating%20Li ... liste.html
This would probably mean that, if you run a long enough test between Stockfish 11 @ 4CPU and 8CPU, the result would be almost, if not equal.