So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.Dann Corbit wrote: ↑Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?Code: Select all
CCRL 40/40 Stockfish 10 64-bit 4CPU 3461 +18 -18 Stockfish 9 64-bit 4CPU 3432 +15 -15 So SF 10 at 40/40 could be as high as 3479 or as low as 3443 So SF 9 at 40/40 could be as high as 3447 or as low as 3417 We are not totally sure which one is stronger 40/4 Stockfish 9 64-bit 4CPU 3547 +12 -12 Stockfish 10 64-bit 4CPU 3546 +12 -12 So SF 9 at 40/4 could be as high as 3559 or as low as 3535 So SF 10 at 40/4 could be as high as 3558 or as low as 3534 We are not totally sure which one is stronger CEGT 40/20 Stockfish 10.0 x64 8CPU 3518 +21 -21 Stockfish 9.0 x64 8CPU 3493 +24 -24 So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497 So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469 We are not totally sure which one is stronger 40/4 Stockfish 10.0 x64 4CPU 3548 +16 -16 Stockfish 9.0 x64 4CPU 3503 +17 -17 So SF 10 at 40/4 could be as high as 3564 or as low as 3532 So SF 9 at 40/4 could be as high as 3520 or as low as 3486 It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
future of top engines:how much more elo?
Moderator: Ras
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: future of top engines:how much more elo?
-
- Posts: 12778
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: future of top engines:how much more elo?
The thing that is interesting is that three of the measurements are very close so that we cannot distinguish which is stronger, and one has a clear result.Laskos wrote: ↑Thu Aug 01, 2019 9:33 pmSo, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.Dann Corbit wrote: ↑Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?Code: Select all
CCRL 40/40 Stockfish 10 64-bit 4CPU 3461 +18 -18 Stockfish 9 64-bit 4CPU 3432 +15 -15 So SF 10 at 40/40 could be as high as 3479 or as low as 3443 So SF 9 at 40/40 could be as high as 3447 or as low as 3417 We are not totally sure which one is stronger 40/4 Stockfish 9 64-bit 4CPU 3547 +12 -12 Stockfish 10 64-bit 4CPU 3546 +12 -12 So SF 9 at 40/4 could be as high as 3559 or as low as 3535 So SF 10 at 40/4 could be as high as 3558 or as low as 3534 We are not totally sure which one is stronger CEGT 40/20 Stockfish 10.0 x64 8CPU 3518 +21 -21 Stockfish 9.0 x64 8CPU 3493 +24 -24 So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497 So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469 We are not totally sure which one is stronger 40/4 Stockfish 10.0 x64 4CPU 3548 +16 -16 Stockfish 9.0 x64 4CPU 3503 +17 -17 So SF 10 at 40/4 could be as high as 3564 or as low as 3532 So SF 9 at 40/4 could be as high as 3520 or as low as 3486 It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
So if a measurement is wrong, which one is it?
I thought the error bars were two standard deviations.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: future of top engines:how much more elo?
Dann Corbit wrote: ↑Thu Aug 01, 2019 9:53 pmThe thing that is interesting is that three of the measurements are very close so that we cannot distinguish which is stronger, and one has a clear result.Laskos wrote: ↑Thu Aug 01, 2019 9:33 pmSo, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.Dann Corbit wrote: ↑Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?Code: Select all
CCRL 40/40 Stockfish 10 64-bit 4CPU 3461 +18 -18 Stockfish 9 64-bit 4CPU 3432 +15 -15 So SF 10 at 40/40 could be as high as 3479 or as low as 3443 So SF 9 at 40/40 could be as high as 3447 or as low as 3417 We are not totally sure which one is stronger 40/4 Stockfish 9 64-bit 4CPU 3547 +12 -12 Stockfish 10 64-bit 4CPU 3546 +12 -12 So SF 9 at 40/4 could be as high as 3559 or as low as 3535 So SF 10 at 40/4 could be as high as 3558 or as low as 3534 We are not totally sure which one is stronger CEGT 40/20 Stockfish 10.0 x64 8CPU 3518 +21 -21 Stockfish 9.0 x64 8CPU 3493 +24 -24 So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497 So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469 We are not totally sure which one is stronger 40/4 Stockfish 10.0 x64 4CPU 3548 +16 -16 Stockfish 9.0 x64 4CPU 3503 +17 -17 So SF 10 at 40/4 could be as high as 3564 or as low as 3532 So SF 9 at 40/4 could be as high as 3520 or as low as 3486 It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
So if a measurement is wrong, which one is it?
I thought the error bars were two standard deviations.
CCRL 40/4 4CPU stands out in your post. The error margins there are 2SD, but the discrepancy with CEGT seems larger than that, about 3SD. Did CCRL group have crashes or losses on time with SF10 in 40/4 4CPU list? SF10 compile does have a small bug, which manifests itself only rarely.
-
- Posts: 44197
- Joined: Sun Feb 26, 2006 10:52 am
- Location: Auckland, NZ
Re: future of top engines:how much more elo?
Time losses are not included unless the result is an obvious one.
gbanksnz at gmail.com
-
- Posts: 4558
- Joined: Tue Jul 03, 2007 4:30 am
Re: future of top engines:how much more elo?
Wait, what? I thought S10 wasn't topping 40/4 because no tester had been interested in testing it yet. Now I see it's been tested and is ranked below S9Ozymandias wrote: ↑Thu Aug 01, 2019 9:04 amI'd say the latter is a different problem which has to do with credibility.

Now I feel like I've been talking about some ant running around the room, missing the big elephant...
Your beliefs create your reality, so be careful what you wish for.
-
- Posts: 2
- Joined: Sun Dec 01, 2019 7:20 pm
- Full name: jeremy rutman
Re: future of top engines:how much more elo?
I found an answer of ~5200 ELO for perfect play by extrapolating drawrate vs. ELO to the point where drawrate=1.

More info here https://unclejerry9466728.wordpress.com/2018/12/20/172/

More info here https://unclejerry9466728.wordpress.com/2018/12/20/172/
-
- Posts: 6927
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: future of top engines:how much more elo?
Hi,
that's interesting.
I am searching an answere around 2010 and made some stats with the result ~ 4700 Elo.
But I don't know how I calculate it, must search in my own backups.
But I am working with drawrate also!
I believe that the gaming phase "transition into endgame" are near to perfection today (strongest engines, maybe 200 Elo are possible today only).
If the current Stockfish have around 3500 Elo (looking on my still running FCP Tourney-2021 vs. 40 opponents) with 1 core = 5200-3500=1700 Elo.
Can't believe that after openings - end of mid-game = 1500 Elo (1700 Elo - 200 Elo I have in my brain for transition into endgame) are possible.
4700 Elo I calculated if Shredder 12 with 1 core = 2800 Elo!
But 10 years later with more possiblities and game material ... maybe 5200 is more right as 4700.
If so ... with 1 Core = 34 years (if 50 Elo per year is possible).
With more cores and better and better hardware the prognosis can be in ... 25 years!
54 years (my age) + 25 years = 79
Maybe I have luck ...
Best
Frank
that's interesting.
I am searching an answere around 2010 and made some stats with the result ~ 4700 Elo.
But I don't know how I calculate it, must search in my own backups.
But I am working with drawrate also!
I believe that the gaming phase "transition into endgame" are near to perfection today (strongest engines, maybe 200 Elo are possible today only).
If the current Stockfish have around 3500 Elo (looking on my still running FCP Tourney-2021 vs. 40 opponents) with 1 core = 5200-3500=1700 Elo.
Can't believe that after openings - end of mid-game = 1500 Elo (1700 Elo - 200 Elo I have in my brain for transition into endgame) are possible.
4700 Elo I calculated if Shredder 12 with 1 core = 2800 Elo!
But 10 years later with more possiblities and game material ... maybe 5200 is more right as 4700.
If so ... with 1 Core = 34 years (if 50 Elo per year is possible).
With more cores and better and better hardware the prognosis can be in ... 25 years!
54 years (my age) + 25 years = 79
Maybe I have luck ...
Best
Frank
-
- Posts: 2
- Joined: Sun Dec 01, 2019 7:20 pm
- Full name: jeremy rutman
Re: future of top engines:how much more elo?
I'd be interested to see how you reached a calculation of 4700 ELO, not so far from my prediction.
Incidentally I also calculated time till perfect computer : according to the surprisingly linear progress, it will occur around year 2059.
But maybe the last point being higher than the trend, and with quantum computers on the way, we can expect faster progress - then we'll both have a chance to see god playing chess!

Incidentally I also calculated time till perfect computer : according to the surprisingly linear progress, it will occur around year 2059.
But maybe the last point being higher than the trend, and with quantum computers on the way, we can expect faster progress - then we'll both have a chance to see god playing chess!

-
- Posts: 6927
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: future of top engines:how much more elo?
Great!
My system was very esay ...
Shredder = 2800 Elo, I collected the draws vs. 2600, 2400, 2200, 2000, 1800 Elo
AnMon = 2350 Elo, I collected the draws vs. 2150, 1950, 1750, 1550, 1350 Elo
Simulation for Shredder results for 3000, 3200, 3400, 3600, 3800 Elo
Simulation for AnMon results for 2550, 2750, 2950, 3150, 3350 Elo (100% aggreement with real games vs. 2550 and 2750 Elo).
Simulation of Simulation with 4000, 4200, 4400, 4600, 4800 Elo
Simulation of Sumulation with 3550, 3750, 3950, 4150, 4350 Elo
I mixed the results for AnMon and Shredder!
And looking to which point of simulation the possible draws go to zero!
The final result was 4700 Elo / 38000
After all this ... the main question for myself are this one:
How many games Shredder 12, with 2800 Elo, need for a draw vs. the perfect playing chess program?
It need ~ 38000 games and Shredder 12 with 2800 Elo can play a draw vs. the perfect playing chess program (if 4700 Elo).
Higher as 4700 Elo made no sense for the calculation I made. So I am thinking all the time ... the max. possible strength must be ~ 4700 or a bit higher.
Again, after all I remember ...
I must looking on my older Excel stats (if I have it in my backup).
Just for fun, not more not less ... never I public this idea for a calculation!
If we have the perfect playing chess program we can calculate playing strength with move average of games.
Best
Frank
My system was very esay ...
Shredder = 2800 Elo, I collected the draws vs. 2600, 2400, 2200, 2000, 1800 Elo
AnMon = 2350 Elo, I collected the draws vs. 2150, 1950, 1750, 1550, 1350 Elo
Simulation for Shredder results for 3000, 3200, 3400, 3600, 3800 Elo
Simulation for AnMon results for 2550, 2750, 2950, 3150, 3350 Elo (100% aggreement with real games vs. 2550 and 2750 Elo).
Simulation of Simulation with 4000, 4200, 4400, 4600, 4800 Elo
Simulation of Sumulation with 3550, 3750, 3950, 4150, 4350 Elo
I mixed the results for AnMon and Shredder!
And looking to which point of simulation the possible draws go to zero!
The final result was 4700 Elo / 38000
After all this ... the main question for myself are this one:
How many games Shredder 12, with 2800 Elo, need for a draw vs. the perfect playing chess program?
It need ~ 38000 games and Shredder 12 with 2800 Elo can play a draw vs. the perfect playing chess program (if 4700 Elo).
Higher as 4700 Elo made no sense for the calculation I made. So I am thinking all the time ... the max. possible strength must be ~ 4700 or a bit higher.
Again, after all I remember ...
I must looking on my older Excel stats (if I have it in my backup).
Just for fun, not more not less ... never I public this idea for a calculation!
If we have the perfect playing chess program we can calculate playing strength with move average of games.
Best
Frank
-
- Posts: 6927
- Joined: Wed Nov 18, 2009 7:16 pm
- Location: Gutweiler, Germany
- Full name: Frank Quisinsky
Re: future of top engines:how much more elo?
Hi,
I will ask Klaus Wlotzka.
Klaus is the Excel expert for such things.
Example:
FEOBOS opening book project, can be found on my website, the Excel have 16,7 millions formulas, most in macros.
Klaus is now working on FCP Tourney-2021 Excel stats.
Klaus calculate the German football league (Fussball Bundesliga in German language) with Elo calculation and his prognosis for possible results are just great.

Very interesting!
But 5200 Elo can be right.
Sure, different ways to calculate it are possible!
Topic is very interesting!!
Best
Frank
I will ask Klaus Wlotzka.
Klaus is the Excel expert for such things.
Example:
FEOBOS opening book project, can be found on my website, the Excel have 16,7 millions formulas, most in macros.
Klaus is now working on FCP Tourney-2021 Excel stats.
Klaus calculate the German football league (Fussball Bundesliga in German language) with Elo calculation and his prognosis for possible results are just great.

Very interesting!
But 5200 Elo can be right.
Sure, different ways to calculate it are possible!
Topic is very interesting!!
Best
Frank