engine evaluation and chess informant symbols

casaschi · Post by **casaschi** » Wed Mar 28, 2012 4:26 pm

Is there any general/agreed standard to correlate engine evaluation (assumed in pawn units) and chess informant evaluation symbols (+-, +/-, +/=, =, =/+, -/+, -+)?

Or, in the lack of uniform standards, if you do that translation when displaing info to the user, what threshold values do you use?

For instance, using thresholds [4.0, 1.3, 0.3]:

ev > 4.0 is +-
else ev > 1.3 is +/-
else ev > 0.3 is +/=
and so on for black advantage

Don · Post by **Don** » Mon Apr 02, 2012 2:10 pm

casaschi wrote:Is there any general/agreed standard to correlate engine evaluation (assumed in pawn units) and chess informant evaluation symbols (+-, +/-, +/=, =, =/+, -/+, -+)?

Or, in the lack of uniform standards, if you do that translation when displaing info to the user, what threshold values do you use?

For instance, using thresholds [4.0, 1.3, 0.3]:

ev > 4.0 is +-
else ev > 1.3 is +/-
else ev > 0.3 is +/=
and so on for black advantage

There is a fairly significant difference in how programs scale their evaluation functions so there is no standard that would make sense. You will notice that some programs will evaluate a big space advantage (usually for white) as half a pawn or more even though theoretically white has only a small advantage. Other programs will be quite conservative.

You can use a logistics function to covert an evaluation score into a "win probability" (which is really ill-defined) but in practice works pretty well assuming equal players. If that is calibrated for each program separately, then we could assign probabilities instead of scores. For example the opening position is considered +/= and this corresponds to about a 54% winning chances. One would have to assign ranges to this, such as determine what to consider even chances? Is 52% close enough to consider even or should it be +/= ? That would have to be determined and a scale worked out.

Another way is to talk in terms of ELO advantage. It is said that white has approximately a 35 ELO advantage from getting the first move. So we could do ELO ranges with intervals of 34 ELO or so to make ranges. So:

17-50 ELO +/=
51-84 ELO +/-
etc ...

ELO can be converted to win percentage and/or scores but scores are program specific.

The interval doesn't have to be 34 ELO of course, we could look at a sample of many different authors and their analysis and try to make this come out roughly the same.

Don

casaschi · Post by **casaschi** » Wed Jan 02, 2013 3:01 pm

@Don: I was really looking to something simpler. I understand that different engines assign different values to each component of their evaluation function.
However, there seems to be some consensus around certain values, such as:
- the evaluation unit is usually expressed in pawns (or centipawns, it does not matter, just multiply by 100)
- a minor piece is usually around 3 pawns (give or take half a pawn, depending on the engine and on the position)
- a rook is around 5 pawns (again give or take half a pawn)
- a queen is around 9 pawns (give or take one pawn)

Assuming for instance that we are in a position where non-material considerations are even, but White has an extra pawn; the evaluation is about +1: does this translate to +/= or to +/- ?
Similarly if, instead of a pawn, White has an extra minor piece; the evaluation is +3; does this translate to +/- or to +- ?

Reason for asking, in the pgn4web popup analysis window (that appears on this forum as well if you click square E8 on the javascript chessboard) I use the chess informant symbols to show visually the position evaluation.

At the moment, the threshold between = and +/= is +0.35, the threshold between +/= and +/- is 1.35 and the threshold between +/- and +- is 3.95

Given the intended purpose of showing a quick visual clue to the position evaluation and (with all due respect, it's the best around) given the quality of the evaluation of a javascript engine, you'll understand that the whole analysis of winning probability that you suggest is complete overkill.

Any suggestion about the 0.35/1.35/3.95 values I use?

Don · Post by **Don** » Wed Jan 02, 2013 3:43 pm

casaschi wrote:@Don: I was really looking to something simpler. I understand that different engines assign different values to each component of their evaluation function.
However, there seems to be some consensus around certain values, such as:
- the evaluation unit is usually expressed in pawns (or centipawns, it does not matter, just multiply by 100)
- a minor piece is usually around 3 pawns (give or take half a pawn, depending on the engine and on the position)
- a rook is around 5 pawns (again give or take half a pawn)
- a queen is around 9 pawns (give or take one pawn)

Assuming for instance that we are in a position where non-material considerations are even, but White has an extra pawn; the evaluation is about +1: does this translate to +/= or to +/- ?
Similarly if, instead of a pawn, White has an extra minor piece; the evaluation is +3; does this translate to +/- or to +- ?

Reason for asking, in the pgn4web popup analysis window (that appears on this forum as well if you click square E8 on the javascript chessboard) I use the chess informant symbols to show visually the position evaluation.

At the moment, the threshold between = and +/= is +0.35, the threshold between +/= and +/- is 1.35 and the threshold between +/- and +- is 3.95

Given the intended purpose of showing a quick visual clue to the position evaluation and (with all due respect, it's the best around) given the quality of the evaluation of a javascript engine, you'll understand that the whole analysis of winning probability that you suggest is complete overkill.

Any suggestion about the 0.35/1.35/3.95 values I use?

Ok, I'll start with a "strawman" proposal and others can suggest ways to modify it if they consider it wrong or improvable:

+= is generally the advantage given to the white player. For a given program you could test that with a 1 or 2 minute search from the opening position and average the last 2 or 4 iteration scores. For Komodo it comes out to be about 20 centipawns. For Houdini just about the same but for Stockfish it is about 30 centipawns. So base it on the program you are using to judge this.

Let's go with 20 which is what Komodo would use. For Komodo this corresponds to +/= advantage.

So in this example we might consider 20 centipawns to be the "width of the interval" so anything from -10 to +10 is "=", anything from 10-30 is "+=" and everything from 30-50 is "+/-" and so on.

My intuition tells me that we might want to expand the width of the interval with each grouping because +/- is supposed to be a "decisive advantage" and 30 centipawns does not seem very decisive to me. So one idea is to add half to the interval for each grouping. In other words use intevals of 20, 30, 45 for each grouping. That would give this table:

-10 to +10 = (about equal)
+10 to +40 +/= (white has slight advantage)
+40 to +85 +/- (white has a decisive advantage)

The category after this means the position is resignable.

I'll warn you now that no 2 chessplayers are likely to agree on the scale used for this as the meaning of these glyphs are subject to both judgement and interpretation. This is a table that I would personally feel more or less comfortable with. I would like to here Larry's opinion on this and other strong players.

The primary consideration is that if we come to some general agreement it STILL has to be calibrated by program because some programs vary by 50% as you see in the case of Stockfish. I have seen Stockfish give a +0.70 advantage to positions that were basically close to equal. No getting around the fact that every program will mis-evaluate from time to time. But it's a GREAT idea to show these glyphs with your widget - I think that would be a wonderful addition to your already wonderful widget!

Don

Don · Post by **Don** » Wed Jan 02, 2013 3:48 pm

casaschi wrote:@Don: I was really looking to something simpler. I understand that different engines assign different values to each component of their evaluation function.
However, there seems to be some consensus around certain values, such as:
- the evaluation unit is usually expressed in pawns (or centipawns, it does not matter, just multiply by 100)
- a minor piece is usually around 3 pawns (give or take half a pawn, depending on the engine and on the position)
- a rook is around 5 pawns (again give or take half a pawn)
- a queen is around 9 pawns (give or take one pawn)

Assuming for instance that we are in a position where non-material considerations are even, but White has an extra pawn; the evaluation is about +1: does this translate to +/= or to +/- ?
Similarly if, instead of a pawn, White has an extra minor piece; the evaluation is +3; does this translate to +/- or to +- ?

Reason for asking, in the pgn4web popup analysis window (that appears on this forum as well if you click square E8 on the javascript chessboard) I use the chess informant symbols to show visually the position evaluation.

At the moment, the threshold between = and +/= is +0.35, the threshold between +/= and +/- is 1.35 and the threshold between +/- and +- is 3.95

Given the intended purpose of showing a quick visual clue to the position evaluation and (with all due respect, it's the best around) given the quality of the evaluation of a javascript engine, you'll understand that the whole analysis of winning probability that you suggest is complete overkill.

Any suggestion about the 0.35/1.35/3.95 values I use?

Your thresholds are way too aggressive. If a chess program shows a 1.2 pawn advantage it is far better than just +/= and a 0.032 advantage is not "about equal"

See my last post for my own suggestion.

Don · Post by **Don** » Wed Jan 02, 2013 3:58 pm

casaschi wrote: Given the intended purpose of showing a quick visual clue to the position evaluation and (with all due respect, it's the best around) given the quality of the evaluation of a javascript engine, you'll understand that the whole analysis of winning probability that you suggest is complete overkill.

I just wanted to add that the quality of the engine is not that important for this purpose as long as it has the basic evaluation terms. A poor evaluation will only mean it is wrong more often but shouldn't change the magnitude of the values you use for thresholds. Is there a web site I can hit to load the javascript into my browser so that I can experiment?

casaschi · Post by **casaschi** » Wed Jan 02, 2013 4:07 pm

Don wrote:Is there a web site I can hit to load the javascript into my browser so that I can experiment?

Just click on square E8 in the chessboard below and move pieces around in the popup window (click from then to square, no drag&drop)
[pgn] e4 [/pgn]
At the moment it shows the chess informant symbols using the scale I described above, but if you hover the mouse on the symbol a tooltip shows the numeric value.
Note that google chrome is the fastest browser for the chess engine; under IE the engine does not work at all because of missing web worker functionality.

About the values, the reason I set those much larger than you suggest was that I looked at those values from the point of view of the viewer, not necessarily the engine.
I agree that +0.7 might be +- for a strong engine or a GM, but it's definitely not a +- (i.e. a sure win) for me. I wonder how those evaluations are (or should be) in fact depending on the average viewer.

Don · Post by **Don** » Wed Jan 02, 2013 4:07 pm

casaschi wrote:Is there any general/agreed standard to correlate engine evaluation (assumed in pawn units) and chess informant evaluation symbols (+-, +/-, +/=, =, =/+, -/+, -+)?

Or, in the lack of uniform standards, if you do that translation when displaing info to the user, what threshold values do you use?

For instance, using thresholds [4.0, 1.3, 0.3]:

ev > 4.0 is +-
else ev > 1.3 is +/-
else ev > 0.3 is +/=
and so on for black advantage

I have a better idea. Someone could put together a sample of a few hundred GM annotated game positions from a variety of Grandmasters and that contain a variety of these glyphs. It could be packaged up in such a way as to take the grunt work out of it. You basically run the program against the glyphs and out pops the thresholds you should use for annotations that would be the most compatible.

Where would I get these annotated games? I will put it together if someone will provide the games or give me a pointer to them.

Don

casaschi · Post by **casaschi** » Wed Jan 02, 2013 4:12 pm

Don wrote:I have a better idea. Someone could put together a sample of a few hundred GM annotated game positions from a variety of Grandmasters and that contain a variety of these glyphs. It could be packaged up in such a way as to take the grunt work out of it. You basically run the program against the glyphs and out pops the thresholds you should use for annotations that would be the most compatible.

Where would I get these annotated games? I will put it together if someone will provide the games or give me a pointer to them.

Interesting approach.
Not every annotator uses the same meaning for the signs, but averaging across a number of them would give you the average meaning.
Issue is where to find annotated games; annotated games are typically NOT free. Chess informant is an obvious high-quality source, chessbase possibly as well, but I dont have either in suitable digital form.

Don · Post by **Don** » Wed Jan 02, 2013 4:14 pm

casaschi wrote:
Don wrote:Is there a web site I can hit to load the javascript into my browser so that I can experiment?
Just click on square E8 in the chessboard below and move pieces around in the popup window (click from then to square, no drag&drop)
[pgn] e4 [/pgn]
At the moment it shows the chess informant symbols using the scale I described above, but if you hover the mouse on the symbol a tooltip shows the numeric value.
Note that google chrome is the fastest browser for the chess engine; under IE the engine does not work at all because of missing web worker functionality.

About the values, the reason I set those much larger than you suggest was that I looked at those values from the point of view of the viewer, not necessarily the engine.
I agree that +0.7 might be +- for a strong engine or a GM, but it's definitely not a +- (i.e. a sure win) for me. I wonder how those evaluations are (or should be) in fact depending on the average viewer.

Based on a very quick (and superficial) analysis you could use the scale I proposed but multiply it by 1.5.

The evaluation of this program is reasonable. I put a morra gambit in and the score is not ridiculous. A classic morra position should return a score close to zero despite being a pawn down and probably be considered approximately equal. I get that white is down about 0.1 which means the evaluation is good enough to see a lot of compensation.

engine evaluation and chess informant symbols

engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols

Re: engine evaluation and chess informant symbols