Estimated Elo Perfect Play

megamau · Post by **megamau** » Thu Mar 24, 2016 4:55 am

I have done some analysis of the data of CCRL rating list, correlating the draw rate to the strength of the programs.

As the ELO increases, so does the draw rate, suggesting the hypothesis that the game is a draw with perfect play. Under the same hypothesis, we can extrapolate the curve of the draw rate, and infer what should be the ELO rating of perfect play.

In this case, we have filtered out the program which played with much stronger or weaker opponents on average (as this would skew the draw rate down). Only programs for which the average opposition is within 20 ELO have been considered.

With all the caveat of extrapolation, it seems that perfect play is nearer than I thought, within 3800 and 5000 ELO.

Laskos · Post by **Laskos** » Thu Mar 24, 2016 5:24 am

Yes, your result (3800 to 5000 CCRL) is very close to what I got doing a different extrapolation (ELO gain per doubling time). IIRC my results were a bit closer to 5000 than to 3800.

yurikvelo · Post by **yurikvelo** » Thu Mar 24, 2016 9:01 am

I have different interpretation of draw reasons and have methodics how to verify it or neglect.

Draw rate of 2500 engine A vs 2500 engine B might be low, because they have different strength and different weaknesses (it's like in thriathlon 2 athletes got the same time on finish, but one is very bad swimmer and second is very bad runner).
Weak engines usually are inmature, lack implementation of different kind of knowledge, lack testing in different conditions.
Usually they behave quite different in different conditions. So there is much probability strong side of engine A meet weakness of engine B in particular game.

Strong engines aren't only strong, historically they are developed using very same ideas, techniques, testing methods. Thay might play draw not because of perfect play, but because they exploit the very same weakness in all type of positions.

To verify my hypo - compare drawrate of the same pair of engines (e.g. latest K vs SF) at 0.1 MN per move and 100 MN per move (HW+TC combination). This is about +500...600 ELO and should give 20-30% increase in draw rate.

Also I suggest if someone develope engine of the same strength as SF/K, but employing extremely different techniques, e.g. AlphaGo approach, draw rate might be much lower despite integral ELO strength equal.

cdani · Post by **cdani** » Thu Mar 24, 2016 1:32 pm

yurikvelo wrote:I have different interpretation of draw reasons and have methodics how to verify it or neglect.

Draw rate of 2500 engine A vs 2500 engine B might be low, because they have different strength and different weaknesses (it's like in thriathlon 2 athletes got the same time on finish, but one is very bad swimmer and second is very bad runner).
Weak engines usually are inmature, lack implementation of different kind of knowledge, lack testing in different conditions.
Usually they behave quite different in different conditions. So there is much probability strong side of engine A meet weakness of engine B in particular game.

Strong engines aren't only strong, historically they are developed using very same ideas, techniques, testing methods. Thay might play draw not because of perfect play, but because they exploit the very same weakness in all type of positions.

To verify my hypo - compare drawrate of the same pair of engines (e.g. latest K vs SF) at 0.1 MN per move and 100 MN per move (HW+TC combination). This is about +500...600 ELO and should give 20-30% increase in draw rate.

Also I suggest if someone develope engine of the same strength as SF/K, but employing extremely different techniques, e.g. AlphaGo approach, draw rate might be much lower despite integral ELO strength equal.

I think also like you. Probably best engines in some years will exploit at least some different fields of play and consequently extend how far the horizon seems to be.

gerold · Post by **gerold** » Thu Mar 24, 2016 3:00 pm

yurikvelo wrote:I have different interpretation of draw reasons and have methodics how to verify it or neglect.

Draw rate of 2500 engine A vs 2500 engine B might be low, because they have different strength and different weaknesses (it's like in thriathlon 2 athletes got the same time on finish, but one is very bad swimmer and second is very bad runner).
Weak engines usually are inmature, lack implementation of different kind of knowledge, lack testing in different conditions.
Usually they behave quite different in different conditions. So there is much probability strong side of engine A meet weakness of engine B in particular game.

Strong engines aren't only strong, historically they are developed using very same ideas, techniques, testing methods. Thay might play draw not because of perfect play, but because they exploit the very same weakness in all type of positions.

To verify my hypo - compare drawrate of the same pair of engines (e.g. latest K vs SF) at 0.1 MN per move and 100 MN per move (HW+TC combination). This is about +500...600 ELO and should give 20-30% increase in draw rate.

Also I suggest if someone develope engine of the same strength as SF/K, but employing extremely different techniques, e.g. AlphaGo approach, draw rate might be much lower despite integral ELO strength equal.

Plus one.

Ozymandias · Post by **Ozymandias** » Thu Mar 24, 2016 7:47 pm

yurikvelo wrote:I have different interpretation of draw reasons and have methodics how to verify it or neglect.

Draw rate of 2500 engine A vs 2500 engine B might be low, because they have different strength and different weaknesses (it's like in thriathlon 2 athletes got the same time on finish, but one is very bad swimmer and second is very bad runner).
Weak engines usually are inmature, lack implementation of different kind of knowledge, lack testing in different conditions.
Usually they behave quite different in different conditions. So there is much probability strong side of engine A meet weakness of engine B in particular game.

Strong engines aren't only strong, historically they are developed using very same ideas, techniques, testing methods. Thay might play draw not because of perfect play, but because they exploit the very same weakness in all type of positions.

To verify my hypo - compare drawrate of the same pair of engines (e.g. latest K vs SF) at 0.1 MN per move and 100 MN per move (HW+TC combination). This is about +500...600 ELO and should give 20-30% increase in draw rate.

Also I suggest if someone develope engine of the same strength as SF/K, but employing extremely different techniques, e.g. AlphaGo approach, draw rate might be much lower despite integral ELO strength equal.

I've heard this sort of argument before, in some form or another, but still haven't seen any indication of a paradigm sift. Most probably because it can't be done.

cdani · Post by **cdani** » Thu Mar 24, 2016 8:08 pm

Ozymandias wrote: I've heard this sort of argument before, in some form or another, but still haven't seen any indication of a paradigm sift. Most probably because it can't be done.

I can propose for example that in many games there should be some lines that can win a position, but as they are not related in any way to tactics or typical weaknesses known by engines, but only to deep ideas, the engines simply cannot see them. So something big or a lot of little improvements should be done to reach this.

Ozymandias · Post by **Ozymandias** » Thu Mar 24, 2016 9:03 pm

cdani wrote:[…] in many games there should be some lines that can win a position, but as they are not related in any way to tactics or typical weaknesses known by engines, but only to deep ideas, the engines simply cannot see them.

I guess you mean, even more deep ideas, because engines already handle pretty deep "ideas" nowadays. That could happen, but you'd need very complex positions, for those ideas to thrive, and books already take care of those. I hope there's some uncharted ones left, but I'm afraid they're nowhere near "many".

mjlef · Post by **mjlef** » Thu Mar 24, 2016 9:21 pm

Laskos wrote:Yes, your result (3800 to 5000 CCRL) is very close to what I got doing a different extrapolation (ELO gain per doubling time). IIRC my results were a bit closer to 5000 than to 3800.

Andreas at fastgm.de gathered some data here using Komodo 9.3:

http://fastgm.de/time-control4.html

There is a clear decline in elo gain from successive doublings. Of course it all depnds on what kind of curve you fit to the data, but is seems that at least for Komodo 9.3, you would eventually reach a point where more CPU power/time would not help improve elo.

Then again, this is just one program. But interesting.

Mark

Laskos · Post by **Laskos** » Thu Mar 24, 2016 11:49 pm

mjlef wrote:
Laskos wrote:Yes, your result (3800 to 5000 CCRL) is very close to what I got doing a different extrapolation (ELO gain per doubling time). IIRC my results were a bit closer to 5000 than to 3800.
Andreas at fastgm.de gathered some data here using Komodo 9.3:

http://fastgm.de/time-control4.html

There is a clear decline in elo gain from successive doublings. Of course it all depnds on what kind of curve you fit to the data, but is seems that at least for Komodo 9.3, you would eventually reach a point where more CPU power/time would not help improve elo.

Then again, this is just one program. But interesting.

Mark

Thanks, very interesting, cleaner results than mine (more games to longer time control).

Several programs seem to converge to a similar limiting value, I did tests with Houdini, Komodo and SF.

From Andreas results, roughly extrapolating would give a limiting value of about 4400-4800 ELO points on CCRL 40/40. Very similar to my results.
Although some may doubt the extrapolation, if one assumes that Chess from standard opening position is a draw, then the extrapolation is natural and straightforward.

Estimated Elo Perfect Play

Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play