future of top engines:how much more elo?

Laskos · Post by **Laskos** » Thu Aug 01, 2019 9:33 pm

Dann Corbit wrote: ↑Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4

CCRL
40/40
Stockfish 10 64-bit 4CPU	3461	+18	-18
Stockfish  9 64-bit 4CPU	3432	+15	-15

So SF 10 at 40/40 could be as high as 3479 or as low as 3443
So SF 9 at 40/40 could be as high as 3447 or as low as 3417
We are not totally sure which one is stronger

40/4
Stockfish  9 64-bit 4CPU	3547	+12	-12
Stockfish 10 64-bit 4CPU	3546	+12	-12

So SF 9 at 40/4 could be as high as 3559 or as low as 3535
So SF 10 at 40/4 could be as high as 3558 or as low as 3534
We are not totally sure which one is stronger

CEGT 
40/20
Stockfish 10.0 x64 8CPU	3518	+21	-21
Stockfish  9.0 x64 8CPU	3493	+24	-24

So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497
So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469
We are not totally sure which one is stronger

40/4
Stockfish 10.0 x64 4CPU	3548	+16	-16
Stockfish  9.0 x64 4CPU	3503	+17	-17

So SF 10 at 40/4 could be as high as 3564 or as low as 3532
So SF 9 at 40/4 could be as high as 3520 or as low as 3486
It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.

Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?

So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.

Dann Corbit · Post by **Dann Corbit** » Thu Aug 01, 2019 9:53 pm

Laskos wrote: ↑Thu Aug 01, 2019 9:33 pm
Dann Corbit wrote: ↑Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4
Code: Select all
CCRL
40/40
Stockfish 10 64-bit 4CPU	3461	+18	-18
Stockfish  9 64-bit 4CPU	3432	+15	-15

So SF 10 at 40/40 could be as high as 3479 or as low as 3443
So SF 9 at 40/40 could be as high as 3447 or as low as 3417
We are not totally sure which one is stronger

40/4
Stockfish  9 64-bit 4CPU	3547	+12	-12
Stockfish 10 64-bit 4CPU	3546	+12	-12

So SF 9 at 40/4 could be as high as 3559 or as low as 3535
So SF 10 at 40/4 could be as high as 3558 or as low as 3534
We are not totally sure which one is stronger

CEGT 
40/20
Stockfish 10.0 x64 8CPU	3518	+21	-21
Stockfish  9.0 x64 8CPU	3493	+24	-24

So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497
So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469
We are not totally sure which one is stronger

40/4
Stockfish 10.0 x64 4CPU	3548	+16	-16
Stockfish  9.0 x64 4CPU	3503	+17	-17

So SF 10 at 40/4 could be as high as 3564 or as low as 3532
So SF 9 at 40/4 could be as high as 3520 or as low as 3486
It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?
So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.

The thing that is interesting is that three of the measurements are very close so that we cannot distinguish which is stronger, and one has a clear result.

So if a measurement is wrong, which one is it?

I thought the error bars were two standard deviations.

Laskos · Post by **Laskos** » Thu Aug 01, 2019 10:04 pm

Dann Corbit wrote: ↑Thu Aug 01, 2019 9:53 pm
Laskos wrote: ↑Thu Aug 01, 2019 9:33 pm
Dann Corbit wrote: ↑Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4
Code: Select all
CCRL
40/40
Stockfish 10 64-bit 4CPU	3461	+18	-18
Stockfish  9 64-bit 4CPU	3432	+15	-15

So SF 10 at 40/40 could be as high as 3479 or as low as 3443
So SF 9 at 40/40 could be as high as 3447 or as low as 3417
We are not totally sure which one is stronger

40/4
Stockfish  9 64-bit 4CPU	3547	+12	-12
Stockfish 10 64-bit 4CPU	3546	+12	-12

So SF 9 at 40/4 could be as high as 3559 or as low as 3535
So SF 10 at 40/4 could be as high as 3558 or as low as 3534
We are not totally sure which one is stronger

CEGT 
40/20
Stockfish 10.0 x64 8CPU	3518	+21	-21
Stockfish  9.0 x64 8CPU	3493	+24	-24

So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497
So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469
We are not totally sure which one is stronger

40/4
Stockfish 10.0 x64 4CPU	3548	+16	-16
Stockfish  9.0 x64 4CPU	3503	+17	-17

So SF 10 at 40/4 could be as high as 3564 or as low as 3532
So SF 9 at 40/4 could be as high as 3520 or as low as 3486
It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?
So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.
The thing that is interesting is that three of the measurements are very close so that we cannot distinguish which is stronger, and one has a clear result.

So if a measurement is wrong, which one is it?

I thought the error bars were two standard deviations.

CCRL 40/4 4CPU stands out in your post. The error margins there are 2SD, but the discrepancy with CEGT seems larger than that, about 3SD. Did CCRL group have crashes or losses on time with SF10 in 40/4 4CPU list? SF10 compile does have a small bug, which manifests itself only rarely.

Graham Banks · Post by **Graham Banks** » Thu Aug 01, 2019 10:41 pm

Laskos wrote: ↑Thu Aug 01, 2019 10:04 pmDid CCRL group have crashes or losses on time with SF10 in 40/4 4CPU list? SF10 compile does have a small bug, which manifests itself only rarely.

Time losses are not included unless the result is an obvious one.

Ovyron · Post by **Ovyron** » Fri Aug 02, 2019 7:52 am

Ozymandias wrote: ↑Thu Aug 01, 2019 9:04 am
Ovyron wrote: ↑Wed Jul 31, 2019 10:57 pm I guess all these discussions are useless, the rating lists are built from volunteer work and what those volunteers want to test (that's why Stockfish 9 tops the 40/4 list...
I'd say the latter is a different problem which has to do with credibility.

Wait, what? I thought S10 wasn't topping 40/4 because no tester had been interested in testing it yet. Now I see it's been tested and is ranked below S9

Now I feel like I've been talking about some ant running around the room, missing the big elephant...

jeremyrutman · Post by **jeremyrutman** » Wed Feb 17, 2021 9:10 pm

I found an answer of ~5200 ELO for perfect play by extrapolating drawrate vs. ELO to the point where drawrate=1.

More info here https://unclejerry9466728.wordpress.com/2018/12/20/172/

Frank Quisinsky · Post by **Frank Quisinsky** » Wed Feb 17, 2021 10:21 pm

Hi,

that's interesting.
I am searching an answere around 2010 and made some stats with the result ~ 4700 Elo.
But I don't know how I calculate it, must search in my own backups.

But I am working with drawrate also!

I believe that the gaming phase "transition into endgame" are near to perfection today (strongest engines, maybe 200 Elo are possible today only).
If the current Stockfish have around 3500 Elo (looking on my still running FCP Tourney-2021 vs. 40 opponents) with 1 core = 5200-3500=1700 Elo.
Can't believe that after openings - end of mid-game = 1500 Elo (1700 Elo - 200 Elo I have in my brain for transition into endgame) are possible.

4700 Elo I calculated if Shredder 12 with 1 core = 2800 Elo!
But 10 years later with more possiblities and game material ... maybe 5200 is more right as 4700.

If so ... with 1 Core = 34 years (if 50 Elo per year is possible).
With more cores and better and better hardware the prognosis can be in ... 25 years!

54 years (my age) + 25 years = 79
Maybe I have luck ...

Best
Frank

jeremyrutman · Post by **jeremyrutman** » Wed Feb 17, 2021 10:37 pm

I'd be interested to see how you reached a calculation of 4700 ELO, not so far from my prediction.
Incidentally I also calculated time till perfect computer : according to the surprisingly linear progress, it will occur around year 2059.
But maybe the last point being higher than the trend, and with quantum computers on the way, we can expect faster progress - then we'll both have a chance to see god playing chess!

Frank Quisinsky · Post by **Frank Quisinsky** » Wed Feb 17, 2021 11:17 pm

Great!
My system was very esay ...

Shredder = 2800 Elo, I collected the draws vs. 2600, 2400, 2200, 2000, 1800 Elo
AnMon = 2350 Elo, I collected the draws vs. 2150, 1950, 1750, 1550, 1350 Elo

Simulation for Shredder results for 3000, 3200, 3400, 3600, 3800 Elo
Simulation for AnMon results for 2550, 2750, 2950, 3150, 3350 Elo (100% aggreement with real games vs. 2550 and 2750 Elo).

Simulation of Simulation with 4000, 4200, 4400, 4600, 4800 Elo
Simulation of Sumulation with 3550, 3750, 3950, 4150, 4350 Elo

I mixed the results for AnMon and Shredder!

And looking to which point of simulation the possible draws go to zero!
The final result was 4700 Elo / 38000

After all this ... the main question for myself are this one:
How many games Shredder 12, with 2800 Elo, need for a draw vs. the perfect playing chess program?

It need ~ 38000 games and Shredder 12 with 2800 Elo can play a draw vs. the perfect playing chess program (if 4700 Elo).
Higher as 4700 Elo made no sense for the calculation I made. So I am thinking all the time ... the max. possible strength must be ~ 4700 or a bit higher.

Again, after all I remember ...
I must looking on my older Excel stats (if I have it in my backup).
Just for fun, not more not less ... never I public this idea for a calculation!

If we have the perfect playing chess program we can calculate playing strength with move average of games.

Best
Frank

Frank Quisinsky · Post by **Frank Quisinsky** » Wed Feb 17, 2021 11:57 pm

Hi,

I will ask Klaus Wlotzka.
Klaus is the Excel expert for such things.

Example:
FEOBOS opening book project, can be found on my website, the Excel have 16,7 millions formulas, most in macros.
Klaus is now working on FCP Tourney-2021 Excel stats.
Klaus calculate the German football league (Fussball Bundesliga in German language) with Elo calculation and his prognosis for possible results are just great.

Very interesting!
But 5200 Elo can be right.
Sure, different ways to calculate it are possible!
Topic is very interesting!!

Best
Frank

future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?

Re: future of top engines:how much more elo?