Estimated Elo Perfect Play

Uri · Post by **Uri** » Fri Mar 25, 2016 5:08 pm

Chess engines are still very far from perfect play.

Chess engines still have certain weaknesses for example:

1. The understanding of the opening and opening theory.

There are certain principles of play which need to be followed in order to play the opening successfuly like rapid development of the pieces (tempo), control of the center (space), castling for king safety, not moving the rocks and queens too early in the opening etc..

2. Strategy and planning are still one of the big weaknesses of even the best chess engines. For example chess engines often underestimate mobility and piece activity.

Chess engines are good at certain tactics but when it comes to understanding the opening and being good at chess strategy then there they are surely lacking.

yurikvelo · Post by **yurikvelo** » Fri Mar 25, 2016 8:15 pm

1. The understanding of the opening and opening theory.

The fact engine easy win handicap proves they understand opening theory better than humans.
Of course, not in bullet games

There are certain principles of play which need to be followed in order to play the opening successfuly like rapid development of the pieces (tempo), control of the center (space), castling for king safety, not moving the rocks and queens too early in the opening etc..

engines are perfect in this and are used to filter crap human opening books

KWRegan · Post by **KWRegan** » Sun Mar 27, 2016 3:56 am

I follow others' use of two indicators that give wildly different answers, but which my "Intrinsic Performance Rating" metric combines into something reasonable. The two indicators individually do not depend on any details of my model---anyone using Guid-Bratko's or Charles Sullivan's TrueChess (http://www.truechess.com/) or chess-db.com's or the http://en.lichess.org/ methods should observe similar results.

1. Average Error Per Move: With enough data one will get an astonishingly good linear fit to human players' (FIDE) Elo ratings. Depending on the engine and depth-or-time of search and other settings, the line I get hits zero error between Elo 3200 and 3400. Well this is funny: several chess engine rating lists rate the top engines higher than most of this range.

2. Agreement with the Engine. Again depending on settings---not so much the engine but things like SinglePV vs. MultiPV, retain-hash vs. clear-hash before each move---the 2780+ crowd hits between 57.5% and 60%. At a little under 1.5% per 100 Elo, you hit 100% somewhere north of Elo 5000. If you count equal-value moves not just the first line as agreements, then 100% is arguably a little under Elo 5000. Well the engines agree with each other only upwards of 75-80% in human-typical positions (less on positions from engine vs. engine games), but even using 57%-to-75% or 60%-to-78% to get a gap of 18%, this still adds at least 1200 Elo to put it above Elo 4000.

I haven't yet been able to execute what I think will be a definitive test, which is measuring the curve of diminishing returns "intrinsically" for quality directly. Furthermore, my IPRs are still mainly indicators of accuracy (likewise all of the others for the most part) and haven't yet incorporated a robust measure of "Challenge Created" which will be based on the work reported at https://rjlipton.wordpress.com/2015/10/ ... tisficing/ But my deeper indicators so far point to a ceiling under 3600 (currently my formulas use 3475, in fact).

One point is that the "rating of perfect play" can only be defined using a universal quantification, not as the rating of any one machine. What I believe is that the current Komodo or Stockfish engine, equipped with a modicum of random choice so that an adversary could not "book up" against any failings in particular positions, could achieve at least 10% against any player P. In that case, the rating of P could never be higher than R + 366, where R is the rating of Stockfish or Komodo under the relevant tournament conditions and 366 is the rating difference for 90% by-game scoring expectation. I wrote about this and a few other musings at https://rjlipton.wordpress.com/2015/11/ ... dditivity/

Guenther · Post by **Guenther** » Sun Mar 27, 2016 11:32 am

yurikvelo wrote:
1. The understanding of the opening and opening theory.
The fact engine easy win handicap proves they understand opening theory better than humans.
Of course, not in bullet games

There are certain principles of play which need to be followed in order to play the opening successfuly like rapid development of the pieces (tempo), control of the center (space), castling for king safety, not moving the rocks and queens too early in the opening etc..
engines are perfect in this and are used to filter crap human opening books

I completely disagree.

1. Handicap games prove nothing about understanding of opening theory.
Humanity spent perhaps 150-200 years of serious efforts to create opening theory for the start position. A single Human playing a handicap
game against a software opponent spent perhaps a few hours for this scenario. The meaning for your assumption is therefore NILE.

2. Engins are still weak in the opening phase compared to opening knowledge of good GMs. Without books one can still watch awful openings despite high time controls.
We have to admit though that they improved a lot in this area, but there are still (not too few) dark spots in several (especially non mainstream) openings, which often would
be handled better by an average FM player.
All in all the opening stage ist still the area which is farest away from perfect play in engines and developers sure know.

Günther

yurikvelo · Post by **yurikvelo** » Sun Mar 27, 2016 12:59 pm

Guenther wrote:Humanity spent perhaps 150-200 years of serious efforts to create opening theory for the start position.

that knowledge was used to develop and tune engine evaluation.
e.g. Deep Blue used only generalization of human games to develop its evaluation function

So, 200 years efforts are not lost with engines, they are employed and prospered to next level

Guenther wrote:2. Engins are still weak in the opening phase compared to opening knowledge of good GMs. Without books one can still watch awful openings despite high time controls.

Please provide one or couple book positions where SF or K choose awful PV

Opening knowledge of superGM is selective, inconsistent, full of mistakes and based on home preparation with engines.

Crap handicap play mean humans do not understand opening theory but just memorized hundred popular lines.
Strong engine play in any position (including Chess 960 and handicap initial position) mean they truly understand opening theory, not based on memorizing

Also, talking about engines we have to agree on terms and conditions.
Computers have unlimited error-free memory, which allow infinite storage of any data, like opening book, learning/session files, permanent hashes, EG-tables.
Deep Blue was not calculating e2-e4, it took it from book which was "developed for 200 years of serious efforts"
AlphaGo uses enormous amount of storage, calculated previously and cannot play without it.

Saying "engines are bad in opening" we imply that engines are handicaped to store any precomputed data, the only allowed exception are EGTB tables.

bob · Post by **bob** » Mon Mar 28, 2016 12:05 am

Laskos wrote:
mjlef wrote:
Laskos wrote:Yes, your result (3800 to 5000 CCRL) is very close to what I got doing a different extrapolation (ELO gain per doubling time). IIRC my results were a bit closer to 5000 than to 3800.
Andreas at fastgm.de gathered some data here using Komodo 9.3:

http://fastgm.de/time-control4.html

There is a clear decline in elo gain from successive doublings. Of course it all depnds on what kind of curve you fit to the data, but is seems that at least for Komodo 9.3, you would eventually reach a point where more CPU power/time would not help improve elo.

Then again, this is just one program. But interesting.

Mark
Thanks, very interesting, cleaner results than mine (more games to longer time control).

Several programs seem to converge to a similar limiting value, I did tests with Houdini, Komodo and SF.

From Andreas results, roughly extrapolating would give a limiting value of about 4400-4800 ELO points on CCRL 40/40. Very similar to my results.
Although some may doubt the extrapolation, if one assumes that Chess from standard opening position is a draw, then the extrapolation is natural and straightforward.

Only drawback to this stuff is that it has little to do with "perfect play". You might (or might not) see a significant jump (or drop) in Elo should a program be able to search the entire game tree. Until we have a true perfect player, these are still just guesses...

If you study 100m sprint times, you might conclude that there is some lower bound. For humans as they exist and have existed. But what happens when eventually one is born with perfection in fast-twitch muscles in terms of alignment, mass, etc? Perfectly optimized angles and lever lengths in bone structure? Etc... Won't know until he hits the track.

tpoppins · Post by **tpoppins** » Mon Mar 28, 2016 4:03 am

yurikvelo wrote: Computers have unlimited error-free memory, which allow infinite storage of any data, like opening book, learning/session files, permanent hashes, EG-tables.

Can you check your calendar and report the year displayed?
Just for reference, it's still 2016 on this end.

yurikvelo · Post by **yurikvelo** » Mon Mar 28, 2016 7:53 am

tpoppins wrote:
yurikvelo wrote: Computers have unlimited error-free memory, which allow infinite storage of any data, like opening book, learning/session files, permanent hashes, EG-tables.
Can you check your calendar and report the year displayed?
Just for reference, it's still 2016 on this end.

Low-end home PC store can handle even biggest (1-2 Gb) *.CTG opening without a single error for decades since deep-blue era.

SuperGM hardly know 0.1% of this book and tend to make spontaneous errors.

AFAIK, FIDE engine championships never forbid using *.CTG

CRoberson · Post by **CRoberson** » Mon Apr 11, 2016 4:45 am

Years ago, I came up with 4000 - 4800. I did it with much simpler statistics. If we assume the Elo standard deviation is 400 points, then how many standard deviations are needed to cover the full playing field?

If we assume that there are 5 std's to the left and right of the norm then we get 4000. If 6, we get 4800.

Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play

Re: Estimated Elo Perfect Play