Towards a standard analysis output format

mhull · Post by **mhull** » Tue Mar 22, 2011 8:18 pm

sje wrote:
Dann Corbit wrote:The most important thing is to standardize for which side you are reporting the score.
Yes; I suspect that there are near irreconcilable differences on this point. In part for reason of symmetry, my code always reports scores from the perspective of the side to move. I'm not sure that this is much better than some alternatives, but it certainly isn't any worse.

For those who might chose White POV score reporting, I ask why is that any different from Black POV reporting? Either seems to be arbitrary.

Probably for the same reason white always moves first. That is not arbitrary.

sje wrote:And so what does "-MateIn1" really mean? In could be "LoseIn1" or maybe either of "LoseIn2" or "Checkmated" depending upon the coder's interpretation. That's another of the reasons I use side-to-move score perspective.

Side-to-move shares the same weaknes as descriptive notation. For purposes of disambiguation, white POV scoring eliminates confusion, once it becomes the standard.

sje wrote:The tablebase probe count also has commas as needed. I'm afraid the the use of commas over spaces is an Americanism and may not be the best.

Maybe there should be a book probe count as well.

marcelk · Post by **marcelk** » Thu Mar 24, 2011 8:52 am

sje wrote: The most important thing is to standardize for which side you are reporting the score.

Suggestion 1:

"1.5w" means white is 1.5 pawn up
"0.5b" black is half a pawn up
No ambiguity.

Suggestion 2:

Don't report in pawns but in winning chance using 1 / (1 + 10 ** (-eval/4.0)) from the chess programming wiki.

70.3% is equivalent to white being 1.5 pawns up
42.9% is equivalent to white being half a pawn down.

Such numbers are also something you can put on television, like in is done in reporting poker matches.

bob · Post by **bob** » Thu Mar 24, 2011 7:42 pm

marcelk wrote:
sje wrote: The most important thing is to standardize for which side you are reporting the score.
Suggestion 1:

"1.5w" means white is 1.5 pawn up
"0.5b" black is half a pawn up
No ambiguity.

Suggestion 2:

Don't report in pawns but in winning chance using 1 / (1 + 10 ** (-eval/4.0)) from the chess programming wiki.

70.3% is equivalent to white being 1.5 pawns up
42.9% is equivalent to white being half a pawn down.

Such numbers are also something you can put on television, like in is done in reporting poker matches.

How accurate is that? I could probably run (say) 100K games on the cluster, and convert Crafty's scores to that and create a large file that showed each value in the range of 0 to 100% and recode the actual result. Then combine it to see if 70% really wins 70% of the games, or something significantly better or worse...

I was thinking about a large array inside Crafty, one entry per move in the game. I record that "percentage" by transforming the eval. Then, at the end of the game, dump each one, but with the real result paired with it. Would not be hard to combine all that data, run it thru a simple program and show for each "percentage of wins based on eval score" what the actual winning percentage was...

Dann Corbit · Post by **Dann Corbit** » Thu Mar 24, 2011 8:03 pm

bob wrote:
marcelk wrote:
sje wrote: The most important thing is to standardize for which side you are reporting the score.
Suggestion 1:

"1.5w" means white is 1.5 pawn up
"0.5b" black is half a pawn up
No ambiguity.

Suggestion 2:

Don't report in pawns but in winning chance using 1 / (1 + 10 ** (-eval/4.0)) from the chess programming wiki.

70.3% is equivalent to white being 1.5 pawns up
42.9% is equivalent to white being half a pawn down.

Such numbers are also something you can put on television, like in is done in reporting poker matches.
How accurate is that? I could probably run (say) 100K games on the cluster, and convert Crafty's scores to that and create a large file that showed each value in the range of 0 to 100% and recode the actual result. Then combine it to see if 70% really wins 70% of the games, or something significantly better or worse...

I was thinking about a large array inside Crafty, one entry per move in the game. I record that "percentage" by transforming the eval. Then, at the end of the game, dump each one, but with the real result paired with it. Would not be hard to combine all that data, run it thru a simple program and show for each "percentage of wins based on eval score" what the actual winning percentage was...

For winning percentage, the program could connect to a database of chess games and form the estimate from that (if and only if the position is in the database). I don't believe you can make an accurate winning percentage from a ce value alone unless it is a mate position.

Houdini · Post by **Houdini** » Thu Mar 24, 2011 8:21 pm

marcelk wrote:Suggestion 2:

Don't report in pawns but in winning chance using 1 / (1 + 10 ** (-eval/4.0)) from the chess programming wiki.

70.3% is equivalent to white being 1.5 pawns up
42.9% is equivalent to white being half a pawn down.

Such numbers are also something you can put on television, like in is done in reporting poker matches.

Surely the coefficient 4.0 in the formula cannot be correct, it implies that an eval of +4 would only give 90% win chance.

For Houdini, +1 eval corresponds to about 85% win, +3 eval is higher than 99% win.
In other words, the formula for Houdini is probably close to 1 / (1 + 10**(-eval*0.75)) .

Robert

hgm · Post by **hgm** » Thu Mar 24, 2011 8:37 pm

That seems too much. A +1 score should correspond to Pawn-odds, and Pawn-odds is supposed to produce around 72% score. That would only be 72% win in complete absense of draws, while it is n fact more likely that most of the points of the sde that is behind comes from draws.

Have you tried self-play of Houdini with Pawn odds?

bob · Post by **bob** » Thu Mar 24, 2011 9:08 pm

hgm wrote:That seems too much. A +1 score should correspond to Pawn-odds, and Pawn-odds is supposed to produce around 72% score. That would only be 72% win in complete absense of draws, while it is n fact more likely that most of the points of the sde that is behind comes from draws.

Have you tried self-play of Houdini with Pawn odds?

I wonder if I ought to actually do this formally as an experiment? I could certainly take Crafty and play it against the gauntlet, normally, then at -1.00, and then -2.00 to see just what effect removing one or two pawns does in terms of Elo...

Don · Post by **Don** » Thu Mar 24, 2011 9:14 pm

Houdini wrote:
marcelk wrote:Suggestion 2:

Don't report in pawns but in winning chance using 1 / (1 + 10 ** (-eval/4.0)) from the chess programming wiki.

70.3% is equivalent to white being 1.5 pawns up
42.9% is equivalent to white being half a pawn down.

Such numbers are also something you can put on television, like in is done in reporting poker matches.
Surely the coefficient 4.0 in the formula cannot be correct, it implies that an eval of +4 would only give 90% win chance.

For Houdini, +1 eval corresponds to about 85% win, +3 eval is higher than 99% win.
In other words, the formula for Houdini is probably close to 1 / (1 + 10**(-eval*0.75)) .

Robert

The coefficient must be tuned to your program, but the basic formula is fairly realistic. This is used in various learning algorithm.

It would be an intesting display feature for a user interface - given the different in ELO between 2 programs that are playing each other, you could estimate the winning chances based on the current position. For example if your program is 300 ELO stronger, but shows a score of -2.15, it would be nice to get a sense of what it's chances are.

Don

hgm · Post by **hgm** » Thu Mar 24, 2011 9:33 pm

That depends strongly on game phase. (Unless the engine does not report in centi-Pawn, but already scales it with game phase, as Rybka does.) In KPPK micro-Max would think itself at +2, but its winning chances would be close to 100% even when it faces the 1000-Elo stronger Houdini...

hgm · Post by **hgm** » Thu Mar 24, 2011 9:43 pm

bob wrote:I wonder if I ought to actually do this formally as an experiment? I could certainly take Crafty and play it against the gauntlet, normally, then at -1.00, and then -2.00 to see just what effect removing one or two pawns does in terms of Elo...

I think such 'direct' piece-value measurements are quite interesting. I played many ten-thousands of such material-imbalance games, not only with Pawn odds, but also deleting Bishop vs Knight, Bishop vs Knight +Pawn, Queen vs 2 Bishops + 1 Knight, etc. (Mostly on 10x8 boards.) In principle you could measure the score of any advantage that way. E.g. if you want to now how much castling rights are worth, play one side without castling rights, and see by how much it loses, in comparison to the Pawn-odds score.

To make it work you need an engine that randomizes well (as there are no books for such imbalanced positions), or shuffle the intial positions (e.g. in Chess960 fashion). And deleting multiple Pawns sometimes gave inconsistent results (the second Pawn having negative effective value), presumably because you give that side a tremendous advantage in development, which can be very dangerous in 10x8 Chess.

Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format

Re: Towards a standard analysis output format