Statistical assessment of chess opening book moves

Edmund · Post by **Edmund** » Sat Feb 27, 2016 5:42 pm

Usually opening books provide for each move per position simple information such as frequency, average score, draw rate, average elo and maximum elo. Comparing moves provided this information is difficult if not impossible. I propose a statistical method to (a) estimate the quality of each move, (b) calculate the precision of the estimate and (c) use the information to compare moves.

I define my move quality score analogue to the ELO formula as a logit transformation from winning probability. A score of 0 implies that equal players have equal winning chances on a certain position. Winning probability = (1+exp(-score/400))^(-1). This has the advantage that measures of precision are straight forward to interpret in a second step.

Next I need to postulate a certain model for chess outcomes. I refer to the paper by Shawul and Coulom (2013) and apply the Davidson model. This model predicts – for each score – the expected win, draw and loss-ratio.

Next for a certain move in a position I consider all instances in my database when it was played. I collect the relative elo of the players and the game outcome. My move-score I estimate through maximum likelihood. It is the parameter that linearly adjusts the relative elo and thus best explains the realized outcome.

So I am estimating the $score that minimizes: SUM[for each game] LN(x)
Where for won games x = 1/(1+EXP(-($delta/400+$score))+ $v * EXP(-($delta/400+$score)/2))
for lost games x = 1/(1+EXP( ($delta/400+$score))+ $v * EXP( ($delta/400+$score)/2))
for drawn games x = 1/(1+EXP(-($delta/400+$score)/2)*(1+EXP($delta/400+$score))/$v)
$delta refers to the elo difference of the players and $v is a constant from the Davidson model related to the draw rate (a possible value for $v could be 1).

You might ask yourself how this can be effectively estimated. Apparently the log-likelihood function dependent on $score can be very well approximated through a quadratic equation. L = ax^2+bx+c.
So I propose for a quick approximation the calculation of L for three specific scores (-1, 0 and +1) and then solve the equation:
c=L(0)
b=(L(1)-L(-1))/2
a=c-b
we find the maximum by setting the first differential 0: score = -b/(2*a)
Applying the fisher information, we can directly derive the standard deviation around our estimate by taking the inverse of the second differential: sd = 1/(2*a)

I have explained how to estimate the score of a certain move and the standard deviation of this score. For easier interpretation and given the approximate relationship between centi-pawns and elo value I suggest reporting score and sd multiplied by 400.

If you then want to compare different moves against each other I propose a one-sided welch test, which will yield a score similar in principle to LOS.

jdart · Post by **jdart** » Mon Feb 29, 2016 4:15 am

I have seen many examples where the score (at least from a reasonable-length search) does not predict the winning chances for a move. In correspondence play especially it is common to see moves that don't look at all good initially but some moves later show a different eval. These are players that analyze a single position for days. It can take that long to find the optimal line.

And if you take a look at engine matches that use limited-length books (such as CCRL) you will see many, many games where an engine plays into a known inferior opening line, often on the first move they search.

--Jon

Edmund · Post by **Edmund** » Mon Feb 29, 2016 8:37 pm

jdart wrote:I have seen many examples where the score (at least from a reasonable-length search) does not predict the winning chances for a move. In correspondence play especially it is common to see moves that don't look at all good initially but some moves later show a different eval. These are players that analyze a single position for days. It can take that long to find the optimal line.

And if you take a look at engine matches that use limited-length books (such as CCRL) you will see many, many games where an engine plays into a known inferior opening line, often on the first move they search.

--Jon

Maybe I was imprecise. My "score" has nothing to do with engine-evaluations. I am proposing an improvement for presenting move-information from a database of played chessgames, where currently only very limited summary statistics are generated.

jdart · Post by **jdart** » Mon Feb 29, 2016 11:26 pm

Ok, sorry I didn't pick up that you were using Win/Loss/Draw statistics, although you did say that. But that is just as problematic, if not more so. Typically strong players play a move until it no longer works (has a refutation), then they switch to something else. The losing move might well have a good win/loss ratio but that is only because it worked for a considerable time. So "goodness" is a factor of time.

--Jon

Statistical assessment of chess opening book moves

Statistical assessment of chess opening book moves

Re: Statistical assessment of chess opening book moves

Re: Statistical assessment of chess opening book moves

Re: Statistical assessment of chess opening book moves