Don Dailey

Joined: 29 Apr 2008
Posts: 4323

Post subject: Re: Comparative nodes per second    Posted: Fri Apr 20, 2012 12:31 pm

mcostalba wrote:
 bob wrote: Someone has previously suggested, although I have not given it much thought, that I could produce a pretty good eval -> winning percentage formula on my cluster stuff.

This eval <-> winning percentage stuff is really misleading. I don't know from where it came out but a lot of people (normally chess players more then programmers) blindly believes in this fetish. Eval score has no meaning taken as an absolute single number. It has a sense only considered relative to another eval score from another position.

P.S. It's my opinion that the biggest problem in computer chess is the fact that even though programs are good at comparing similar positions, they are much less capable of comparing positions that are significantly different and even worse at comparing positions that are unbalanced. This is another way of saying that they do not have very good evaluation functions. So really a single number should be all that is required because an evaluation function should be transitive. In chess programs they are not, simply because chess programs are broken with respect to evaluation.

With a properly transitive evaluation function you should be able to compare any 2 positions, even if they are totally different, and determine which one is "best." But in practice that doesn't work very well. If you have a choice of two ways to proceed that lead to 2 completely different kinds of positions your chess program probably doesn't have a clue which is better - unless one is significantly better than the other.

Part of the reason I like the logistic function is that it imposes a definite meaning to a score, or at least it attempts to. What does a pawn up really mean? You are never really a pawn up, just giving up a pawn usually gives you some advantage, even if it's not enough. If you watch 2 master player and one is pawn up or down after just a few moves, he probably has (at least some) compensation. Thus you sometimes hear the phrase, "a pawn down but with compensation" or "equal chances." But as chess programs have improved over the years you will notice that they are not quite as materialistic as they used to be. Komodo and most other good programs when given a gambit position to analyse will return a score fairly close to zero. So at least we are starting to think more in terms of positional chess and not head count.

Very often you will find that 2 chess programs have different "scales" when it comes to scoring positions, one tends to be more aggressive about scoring than the other. Where Komodo thinks 25 centipawns, Stockfish thinks 40 or 50 centipawns for example. We could standardize the meaning of 1 centi-pawn for each program by applying a simple calibration function - a few hundred test games could easily do this.

What I suggest however is not just something superficial, it should really be the way we think about an evaluation score. For example, if your program says you are up 100 centi-pawns does that mean the same thing in the ending as in the opening? If not, you have a source of error and your program will be lousy at comparing these two positions, thinking they are they equally good when they are not.

 Quote: So it is just a part of a pair use to compare positions but what counts for the engine search is the pair. Don said instead an interesting thing, that the same eval is not the same if returned upon searching the leafs or high in the tree.


"Your superior intellect is no match for our puny weapons." -Kang and Kodos
