Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0%
Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Moderator: Ras
Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0%
Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Code: Select all
1 stockfish.11 90 50 49 96 54% 67 43%
2 minic_2.50_uci_nnue_8t 67 18 18 767 62% -8 47%
3 stockfish.10 55 49 50 96 48% 67 44%
4 minic_2.50_uci_nnue_4t 35 46 46 96 44% 67 69%
5 stockfish.9 8 48 49 96 40% 67 49%
6 stockfish.8 -20 50 51 96 36% 67 43%
7 minic_2.50_uci_nnue_2t -40 48 49 96 31% 67 56%
8 stockfish.7 -72 52 55 96 29% 67 33%
9 minic_2.50_uci_nnue -123 52 56 95 21% 67 41%
The difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
What's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
"I'm not a fan of CCRL any more, just for that. They mix Multicore results and single core with completely different opponents. No doubt that the results will be necessarely biased !"RogerC wrote: ↑Tue Oct 06, 2020 7:38 pm I'm not a fan of CCRL any more, just for that. They mix Multicore results and single core with completely different opponents. No doubt that the results will be necessarely biased !
I only look at CEGT 40/4 and 40/20 tournaments, which is more accurate in ELO calculations on Single and Multicore Engines (1, 4, 8 and 12 thread) :
http://www.cegt.net/40_4_Ratinglist/40_ ... liste.html
http://www.cegt.net/40_40%20Rating%20Li ... liste.html
If you want to focus on competition between SF vs LC0 (the 2 best engines of the world for now) , look at Stefan Pohl Computer Chess tournament. There you will find the best nets tests for LC0 and the results of LC0 best net vs last SFdev :
https://www.sp-cc.de/nn-vs-sf-testing.htm
Then you are assuming this is true then with STOCKFISH 12. So you have no data! This is why you always fall off the rails.Laskos wrote: ↑Tue Oct 06, 2020 8:02 pmWhat's not clear? 3 doublings in cores mean nowadays at least 2.5 real effective doublings in TC. Each effective doubling in TC in these blitz conditions means at very least 40 Elo points, therefore at very least 80 Elo points 1 core -> 8 cores. In fact more likely 120 - 140 Elo points. That result posted in OP and discrepancy beyond doubt break the Elo model.mwyoung wrote: ↑Tue Oct 06, 2020 7:43 pm"The difference should be (much) in excess of 80 Elo points in these conditions."Laskos wrote: ↑Tue Oct 06, 2020 7:37 pmThe difference should be (much) in excess of 80 Elo points in these conditions, here it is -8 +/- 36 Elo points 2 standard deviations, therefore the mismatch is highly statistically significant. The explanation is that Leela-like MCTS engines in a pool of AB engines don't obey the Elo model, and this was discussed awhile ago here.mwyoung wrote: ↑Tue Oct 06, 2020 6:48 pm"Yes, underperformance of 8CPU SF12 is statistically significant"Laskos wrote: ↑Tue Oct 06, 2020 6:16 pm5 out of 6 opponents of SF12 8CPU are Leela-like MCTS engines which compress Elo differences when playing against AB engines (was discussed more than a year ago here). Yes, underperformance of 8CPU SF12 is statistically significant, despite not that large number of games.Alayan wrote: ↑Tue Oct 06, 2020 6:00 pm Both got a very different mix of opponents. Both don't have that much games so small sample size doesn't help, but :
Elo transitivity flat out doesn't work, and we can get absurd results like this if the opponent mix is different enough.Code: Select all
Stockfish 12 64-bit 3666 +22 −22 89.3% −325.7 21.0% Stockfish 12 64-bit 8CPU 3658 +28 −28 63.4% −75.1 70.7%
Why is this true?
What should be the Elo difference with testing between SF 12 on 1 core vs SF 12 on 8 cores at this fast TC?
By CCRL own testing results. SF 12 on 8 cores could have a rating of 3686, and SF 12 on 1 core could have a rating of 3644.
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.
Why should it be over 80 Elo with SF 12. 1 core vs 8 cores. I have not tested this. What results are you looking at that do not agree with CCRL.
If you are correct. Then why is it so off. As I said before...
This is not even considering the hardware CCRL is using. And is it configured correctly. As in did they lock the cpu core speed of the CPUs, CPU ramping, and other considerations. This can have a big impact on performance at these TC.