Stable and/or accurate eval

smirobth · Post by **smirobth** » Fri Jul 06, 2007 12:19 am

jwes wrote:From a programming perspective, the only thing that matters is that better positions have higher evaluations. The actual values of the evaluations are largely irrelevant.

Exactly. Except that I would say that the actual values are totally irrelevant (not just largely), except for when a human looking at them is trying to interpret, based on the evaluation, the likelihood of a particular position being won or not. And this will vary from engine to engine.

Hart · Post by **Hart** » Fri Jul 06, 2007 6:04 am

I am not a chess programmer or anything close so I could very well be wrong here, but the impression that I get is that an accurate evaluation in terms of centipawns and winning chances could be very useful. If the score on the best move is .30 and the score on the second is .29, might it be prudent to search the mainline further, or perhaps force a quicker cutoff? Would a more accurate evaluation not help you make more intelligent and quicker "cutoffs"? Another example is the best move remaining a constant (say) .30 but the second starts at -.40 and rises steadily with the depth of the search (say) in a linear fashion. If this is the case, and your eval. can semi-accurately predict winning chances, I'd think it'd wise to investigate the line further. Maybe the latter example is only beneficial for timed matches (I don't know), but I imagine there would be a way to carry this over to the main search function. I have seen Rybka play like this (sort of): If the first move is so much better than the second by X pawns at depth Y then she will play the move almost instantly. So could an accurate evaluation also tell you how much more time would be useful looking at refutation or alternatives?

smirobth · Post by **smirobth** » Fri Jul 06, 2007 4:47 pm

Hart wrote:I am not a chess programmer or anything close so I could very well be wrong here, but the impression that I get is that an accurate evaluation in terms of centipawns and winning chances could be very useful. If the score on the best move is .30 and the score on the second is .29, might it be prudent to search the mainline further, or perhaps force a quicker cutoff? Would a more accurate evaluation not help you make more intelligent and quicker "cutoffs"? Another example is the best move remaining a constant (say) .30 but the second starts at -.40 and rises steadily with the depth of the search (say) in a linear fashion. If this is the case, and your eval. can semi-accurately predict winning chances, I'd think it'd wise to investigate the line further. Maybe the latter example is only beneficial for timed matches (I don't know), but I imagine there would be a way to carry this over to the main search function. I have seen Rybka play like this (sort of): If the first move is so much better than the second by X pawns at depth Y then she will play the move almost instantly. So could an accurate evaluation also tell you how much more time would be useful looking at refutation or alternatives?

I am not a programmer either, but I do know that classical mimimax/alpha-beta does not care about the magnitude of evaluation differences. If the eval for move "A" > "B" by a small amount it is no different than if "A" > "B" by a large amount. But of course not all programs are simple minimax and I believe there are programs that do look at the magnitude of evaluation differences. However even in such a case, as long as the evaluation differences are consistent it should make no difference. If every evaluation in multiplied by 10X and the rest of the program is adjusted accordingly the result should still be exactly the same.

AGove · Post by **AGove** » Fri Jul 06, 2007 5:50 pm

as long as the evaluation differences are consistent it should make no difference. If every evaluation in multiplied by 10X and the rest of the program is adjusted accordingly the result should still be exactly the same.

No! If "the rest of the program is adjusted accordingly" then it isn't merely a 10X engine. The adjustments would have to extend into the heart of the code since they affect piece values, contempt settings, etc.

Your point, I think, was that a 10X engine might make inconvenient reports as far as a human viewer is concerned, but otherwise wouldn't be any different in play and analysis. Well, we should ask in passing what the reports (the evaluations) are for. But even more importantly, the reports are not arbitrary or a matter of style. Evaluations are initially grounded on piece imbalances. If an evaluation isn't grounded, at some point, to some extent, on piece values, then we are entitled to ask on what it is grounded. The 10X engine looks like a simple arithmetical aberration - looks like the programmer misplaced a single decimal point. And if that's all it was, then of course it's no more than an inconvenience to the human viewer. But it is more than that. Firstly, it requires substantially more "adjustments" than what the viewer may suppose from the reports; secondly, it divorces the program from conventional piece values, when that was the point of having the reports in the form that they are presently, usually given. This entire thread is a discussion of the accuracy of those evaluations. It's no help to anyone to say that the evaluations could be anything you like.

Dirt · Post by **Dirt** » Fri Jul 06, 2007 6:55 pm

Hart wrote:I am not a chess programmer or anything close so I could very well be wrong here, but the impression that I get is that an accurate evaluation in terms of centipawns and winning chances could be very useful. If the score on the best move is .30 and the score on the second is .29, might it be prudent to search the mainline further, or perhaps force a quicker cutoff?

To get an accurate evaluation of the second best move (two variation mode), the computer has to give up some analysis depth. I'm not aware of anyone finding an approach that make good enough use of the second evaluation to make it worthwhile, but I'm not sure it's impossible.

AGove · Post by **AGove** » Fri Jul 06, 2007 8:14 pm

To get an accurate evaluation of the second best move (two variation mode), the computer has to give up some analysis depth. I'm not aware of anyone finding an approach that make good enough use of the second evaluation to make it worthwhile, but I'm not sure it's impossible.

In positions where there is only one legal move, it makes sense to make that one and only move immediately. In a sense, the engine would make the move on the basis of the evaluation of the second-best move (which in this case happens to be non-existent).

In positions where a simple exchange (perhaps with check) is taking place, the second-best move would be clearly worse because it loses a piece. It makes sense for the engine to play the recapturing move quite quickly, on the basis of the evaluation of the second-best move.

Being able to evaluate the second-best moves seems to be advantageous in these two cases. There are other such cases involving, for example, the avoidance of draws by repetition, or indeed the avoidance of checkmate.

smirobth · Post by **smirobth** » Sat Jul 07, 2007 5:43 am

AGove wrote:
as long as the evaluation differences are consistent it should make no difference. If every evaluation in multiplied by 10X and the rest of the program is adjusted accordingly the result should still be exactly the same.
No! If "the rest of the program is adjusted accordingly" then it isn't merely a 10X engine. The adjustments would have to extend into the heart of the code since they affect piece values, contempt settings, etc.

Your point, I think, was that a 10X engine might make inconvenient reports as far as a human viewer is concerned, but otherwise wouldn't be any different in play and analysis. Well, we should ask in passing what the reports (the evaluations) are for. But even more importantly, the reports are not arbitrary or a matter of style. Evaluations are initially grounded on piece imbalances. If an evaluation isn't grounded, at some point, to some extent, on piece values, then we are entitled to ask on what it is grounded. The 10X engine looks like a simple arithmetical aberration - looks like the programmer misplaced a single decimal point. And if that's all it was, then of course it's no more than an inconvenience to the human viewer. But it is more than that. Firstly, it requires substantially more "adjustments" than what the viewer may suppose from the reports; secondly, it divorces the program from conventional piece values, when that was the point of having the reports in the form that they are presently, usually given. This entire thread is a discussion of the accuracy of those evaluations. It's no help to anyone to say that the evaluations could be anything you like.

I think you are wrong. Pure mini-max programs with 10x evals will require ZERO program adjustment and the adjustments for any others that require it will be trivial. The 10X evals would be millipawns instead of centipawns, that is all. The lines searched, the PV's, and everything else will remain unchanged. There was once even a program that produced reports where a pawn = 128 instread of 100 and it played just fine, and was very strong. Absolute eval values ARE arbitrary, only relative values matter. Obviously you wouldn't want a program to evaluate queens as less valuable than rooks, but as long as the eval ratios are accurate their absolute value will not matter to program strength. In fact absolute values will not matter for anything at all except human interpretation and human usage; the programs won't care. That Shredder gives extremely optimistic evals compared to other engines (as it does), and Rybka give pessimistic evals compared to other engines (as it also does), it is more or less obvious that absolute eval values is arbitrary as far as playing strength is concerned, since if it were not these couldn't be two of the strongest engines available.

I stand by my claim that there is no such thing as "absolute evaluation accuracy". If you can find even one chess programmer who disagrees I will be very surprised.

smirobth · Post by **smirobth** » Sat Jul 07, 2007 6:18 am

AGove wrote:
To get an accurate evaluation of the second best move (two variation mode), the computer has to give up some analysis depth. I'm not aware of anyone finding an approach that make good enough use of the second evaluation to make it worthwhile, but I'm not sure it's impossible.
In positions where there is only one legal move, it makes sense to make that one and only move immediately. In a sense, the engine would make the move on the basis of the evaluation of the second-best move (which in this case happens to be non-existent).

In positions where a simple exchange (perhaps with check) is taking place, the second-best move would be clearly worse because it loses a piece. It makes sense for the engine to play the recapturing move quite quickly, on the basis of the evaluation of the second-best move.

Being able to evaluate the second-best moves seems to be advantageous in these two cases. There are other such cases involving, for example, the avoidance of draws by repetition, or indeed the avoidance of checkmate.

The examples you give for looking at the second best move could be done very quickly, with a shallow search taking a small fraction of a second. If the move is forced the program could then make the move. But I don't think any program uses or would benefit from using info from a deep (more than a small fraction of a second) search of the second best move. The time penalty is too severe. Searching two moves to full depth and width (which is required to get the eval for the second move) will often take twice as long, or even more, compared to searching only the first move. In that same doubling of time programs with an excellent branching factor (2) can search an entire extra ply.

Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval

Re: Stable and/or accurate eval