future of top engines:how much more elo?

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: future of top engines:how much more elo?

Post by Laskos »

Dann Corbit wrote: Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4

Code: Select all

CCRL
40/40
Stockfish 10 64-bit 4CPU	3461	+18	-18
Stockfish  9 64-bit 4CPU	3432	+15	-15

So SF 10 at 40/40 could be as high as 3479 or as low as 3443
So SF 9 at 40/40 could be as high as 3447 or as low as 3417
We are not totally sure which one is stronger

40/4
Stockfish  9 64-bit 4CPU	3547	+12	-12
Stockfish 10 64-bit 4CPU	3546	+12	-12

So SF 9 at 40/4 could be as high as 3559 or as low as 3535
So SF 10 at 40/4 could be as high as 3558 or as low as 3534
We are not totally sure which one is stronger

CEGT 
40/20
Stockfish 10.0 x64 8CPU	3518	+21	-21
Stockfish  9.0 x64 8CPU	3493	+24	-24

So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497
So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469
We are not totally sure which one is stronger

40/4
Stockfish 10.0 x64 4CPU	3548	+16	-16
Stockfish  9.0 x64 4CPU	3503	+17	-17

So SF 10 at 40/4 could be as high as 3564 or as low as 3532
So SF 9 at 40/4 could be as high as 3520 or as low as 3486
It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?
So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.
Dann Corbit
Posts: 12778
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: future of top engines:how much more elo?

Post by Dann Corbit »

Laskos wrote: Thu Aug 01, 2019 9:33 pm
Dann Corbit wrote: Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4

Code: Select all

CCRL
40/40
Stockfish 10 64-bit 4CPU	3461	+18	-18
Stockfish  9 64-bit 4CPU	3432	+15	-15

So SF 10 at 40/40 could be as high as 3479 or as low as 3443
So SF 9 at 40/40 could be as high as 3447 or as low as 3417
We are not totally sure which one is stronger

40/4
Stockfish  9 64-bit 4CPU	3547	+12	-12
Stockfish 10 64-bit 4CPU	3546	+12	-12

So SF 9 at 40/4 could be as high as 3559 or as low as 3535
So SF 10 at 40/4 could be as high as 3558 or as low as 3534
We are not totally sure which one is stronger

CEGT 
40/20
Stockfish 10.0 x64 8CPU	3518	+21	-21
Stockfish  9.0 x64 8CPU	3493	+24	-24

So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497
So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469
We are not totally sure which one is stronger

40/4
Stockfish 10.0 x64 4CPU	3548	+16	-16
Stockfish  9.0 x64 4CPU	3503	+17	-17

So SF 10 at 40/4 could be as high as 3564 or as low as 3532
So SF 9 at 40/4 could be as high as 3520 or as low as 3486
It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?
So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.
The thing that is interesting is that three of the measurements are very close so that we cannot distinguish which is stronger, and one has a clear result.

So if a measurement is wrong, which one is it?

I thought the error bars were two standard deviations.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: future of top engines:how much more elo?

Post by Laskos »

Dann Corbit wrote: Thu Aug 01, 2019 9:53 pm
Laskos wrote: Thu Aug 01, 2019 9:33 pm
Dann Corbit wrote: Thu Aug 01, 2019 9:09 pm Here are some rating lists for SF at 40/40, 40/20, and 40/4

Code: Select all

CCRL
40/40
Stockfish 10 64-bit 4CPU	3461	+18	-18
Stockfish  9 64-bit 4CPU	3432	+15	-15

So SF 10 at 40/40 could be as high as 3479 or as low as 3443
So SF 9 at 40/40 could be as high as 3447 or as low as 3417
We are not totally sure which one is stronger

40/4
Stockfish  9 64-bit 4CPU	3547	+12	-12
Stockfish 10 64-bit 4CPU	3546	+12	-12

So SF 9 at 40/4 could be as high as 3559 or as low as 3535
So SF 10 at 40/4 could be as high as 3558 or as low as 3534
We are not totally sure which one is stronger

CEGT 
40/20
Stockfish 10.0 x64 8CPU	3518	+21	-21
Stockfish  9.0 x64 8CPU	3493	+24	-24

So Stockfish 10 at 40/20 could be as high as 3539 or as low as 3497
So Stockfish 9 at 40/20 could be as high as 3517 or as low as 3469
We are not totally sure which one is stronger

40/4
Stockfish 10.0 x64 4CPU	3548	+16	-16
Stockfish  9.0 x64 4CPU	3503	+17	-17

So SF 10 at 40/4 could be as high as 3564 or as low as 3532
So SF 9 at 40/4 could be as high as 3520 or as low as 3486
It appears that SF 10 is stronger, within a couple of standard deviations, but not by much.
Which measurements exactly do you have an issue with, and what evidence do you propose to show that the measurements are wrong?
So, CCRL and CEGT ratings at 40/4 4CPU for SF10 compared to SF9 are not compatible to 3 standard deviations, which is very bad. The conditions are very similar. This CCRL rating seems dubious, in the context given.
The thing that is interesting is that three of the measurements are very close so that we cannot distinguish which is stronger, and one has a clear result.

So if a measurement is wrong, which one is it?

I thought the error bars were two standard deviations.

CCRL 40/4 4CPU stands out in your post. The error margins there are 2SD, but the discrepancy with CEGT seems larger than that, about 3SD. Did CCRL group have crashes or losses on time with SF10 in 40/4 4CPU list? SF10 compile does have a small bug, which manifests itself only rarely.
User avatar
Graham Banks
Posts: 44197
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: future of top engines:how much more elo?

Post by Graham Banks »

Laskos wrote: Thu Aug 01, 2019 10:04 pmDid CCRL group have crashes or losses on time with SF10 in 40/4 4CPU list? SF10 compile does have a small bug, which manifests itself only rarely.
Time losses are not included unless the result is an obvious one.
gbanksnz at gmail.com
User avatar
Ovyron
Posts: 4558
Joined: Tue Jul 03, 2007 4:30 am

Re: future of top engines:how much more elo?

Post by Ovyron »

Ozymandias wrote: Thu Aug 01, 2019 9:04 am
Ovyron wrote: Wed Jul 31, 2019 10:57 pm I guess all these discussions are useless, the rating lists are built from volunteer work and what those volunteers want to test (that's why Stockfish 9 tops the 40/4 list...
I'd say the latter is a different problem which has to do with credibility.
Wait, what? I thought S10 wasn't topping 40/4 because no tester had been interested in testing it yet. Now I see it's been tested and is ranked below S9 :shock:

Now I feel like I've been talking about some ant running around the room, missing the big elephant...
Your beliefs create your reality, so be careful what you wish for.
jeremyrutman
Posts: 2
Joined: Sun Dec 01, 2019 7:20 pm
Full name: jeremy rutman

Re: future of top engines:how much more elo?

Post by jeremyrutman »

I found an answer of ~5200 ELO for perfect play by extrapolating drawrate vs. ELO to the point where drawrate=1.

Image

More info here https://unclejerry9466728.wordpress.com/2018/12/20/172/
Frank Quisinsky
Posts: 6927
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: future of top engines:how much more elo?

Post by Frank Quisinsky »

Hi,

that's interesting.
I am searching an answere around 2010 and made some stats with the result ~ 4700 Elo.
But I don't know how I calculate it, must search in my own backups.

But I am working with drawrate also!

I believe that the gaming phase "transition into endgame" are near to perfection today (strongest engines, maybe 200 Elo are possible today only).
If the current Stockfish have around 3500 Elo (looking on my still running FCP Tourney-2021 vs. 40 opponents) with 1 core = 5200-3500=1700 Elo.
Can't believe that after openings - end of mid-game = 1500 Elo (1700 Elo - 200 Elo I have in my brain for transition into endgame) are possible.

4700 Elo I calculated if Shredder 12 with 1 core = 2800 Elo!
But 10 years later with more possiblities and game material ... maybe 5200 is more right as 4700.

If so ... with 1 Core = 34 years (if 50 Elo per year is possible).
With more cores and better and better hardware the prognosis can be in ... 25 years!

54 years (my age) + 25 years = 79
Maybe I have luck ...

Best
Frank
jeremyrutman
Posts: 2
Joined: Sun Dec 01, 2019 7:20 pm
Full name: jeremy rutman

Re: future of top engines:how much more elo?

Post by jeremyrutman »

I'd be interested to see how you reached a calculation of 4700 ELO, not so far from my prediction.
Incidentally I also calculated time till perfect computer : according to the surprisingly linear progress, it will occur around year 2059.
But maybe the last point being higher than the trend, and with quantum computers on the way, we can expect faster progress - then we'll both have a chance to see god playing chess!
Image
Frank Quisinsky
Posts: 6927
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: future of top engines:how much more elo?

Post by Frank Quisinsky »

Great!
My system was very esay ...

Shredder = 2800 Elo, I collected the draws vs. 2600, 2400, 2200, 2000, 1800 Elo
AnMon = 2350 Elo, I collected the draws vs. 2150, 1950, 1750, 1550, 1350 Elo

Simulation for Shredder results for 3000, 3200, 3400, 3600, 3800 Elo
Simulation for AnMon results for 2550, 2750, 2950, 3150, 3350 Elo (100% aggreement with real games vs. 2550 and 2750 Elo).

Simulation of Simulation with 4000, 4200, 4400, 4600, 4800 Elo
Simulation of Sumulation with 3550, 3750, 3950, 4150, 4350 Elo

I mixed the results for AnMon and Shredder!

And looking to which point of simulation the possible draws go to zero!
The final result was 4700 Elo / 38000

After all this ... the main question for myself are this one:
How many games Shredder 12, with 2800 Elo, need for a draw vs. the perfect playing chess program?

It need ~ 38000 games and Shredder 12 with 2800 Elo can play a draw vs. the perfect playing chess program (if 4700 Elo).
Higher as 4700 Elo made no sense for the calculation I made. So I am thinking all the time ... the max. possible strength must be ~ 4700 or a bit higher.

Again, after all I remember ...
I must looking on my older Excel stats (if I have it in my backup).
Just for fun, not more not less ... never I public this idea for a calculation!

If we have the perfect playing chess program we can calculate playing strength with move average of games.

Best
Frank
Frank Quisinsky
Posts: 6927
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: future of top engines:how much more elo?

Post by Frank Quisinsky »

Hi,

I will ask Klaus Wlotzka.
Klaus is the Excel expert for such things.

Example:
FEOBOS opening book project, can be found on my website, the Excel have 16,7 millions formulas, most in macros.
Klaus is now working on FCP Tourney-2021 Excel stats.
Klaus calculate the German football league (Fussball Bundesliga in German language) with Elo calculation and his prognosis for possible results are just great.

:-)

Very interesting!
But 5200 Elo can be right.
Sure, different ways to calculate it are possible!
Topic is very interesting!!

Best
Frank