How Do You Automatically Tune Your Evaluation Tables

tpetzke · Post by **tpetzke** » Wed Mar 12, 2014 1:20 pm

I think a potential problem of this method is that the interaction between eval and search is not considered.

If you tune your eval by game play then tuning the weights will also consider something like the pruning margins present in the engine. So as long as you don't retune also the search control parameters afterwards the method has a bit of a disadvantage against a tuning that considers both.

I might experiment a bit more in the future because exercise like this are fun even if they don't improve the engine. ELO is nice but there is more than that

Thomas...

Steve Maughan · Post by **Steve Maughan** » Wed Mar 12, 2014 1:21 pm

Hi Thomas,

I think it's a little more than cosmetic. The constant in the logistic function determines the probability of a win for an evaluation difference. By fixing a pawn at 100 you're (slightly) over constraining the optimization.

Other in the thread have said they derived values for a pawn of +40 in the middlegame and +150 in the endgame.

Steve

tpetzke · Post by **tpetzke** » Wed Mar 12, 2014 1:44 pm

Hi Steve,

I use -0,58 as constant to translate my pawn based values to probabilities based on the formula in the wiki for the tuning process.

In normal operation my engine just puts out cp values.

Thomas...

AlvaroBegue · Post by **AlvaroBegue** » Wed Mar 12, 2014 1:59 pm

I think Steve has a point. You shouldn't fix both the value of the pawn and the constant in the logistic function. -0.58 is fine so the scale of the evaluation is about what it is for other engines, but you should then let the value of the pawn fluctuate.

jdart · Post by **jdart** » Wed Mar 12, 2014 7:54 pm

Also see:

http://en.wikipedia.org/wiki/NEWUOA and related algorithms. CONDOR (http://www.applied-mathematics.net/ is one implementation derived from these. Another is in the dlib library (http://dlib.net/optimization.html).

These algorithms are for optimization problems where the function to be optimized is possibly expensive to compute and noisy.

--Jon

Evert · Post by **Evert** » Thu Mar 13, 2014 9:50 am

Let me see if I understand correctly what the issue is here.

In a simple case, where weights do not depend on game-phase, the evaluation function should return values such that if E1 > E2, then the probability to win the position corresponding to E1 (say P1) is larger than the position corresponding to E2 (say P2). The absolute scale of the evaluation is arbitrary, but one typically fixes the value of the pawn (say, VP = 100, but this is again arbitrary).

In practice, the evaluation is really a 2-vector, for middle-game and end-game scores, which is converted to a scalar by a linear weight depending on the material on the board ("game phase"). We should still have the property that if E1 > E2, then the probability of winning P1 is larger than the probability of winning P2. However, we now have two scale factors: for the middle-game and for the end game. One is still arbitrary, but the other one needs to be fixed somehow, otherwise the interpolation makes no sense.

A simple way to do this is to simply set VP(middle-game) = VP(end game), then tuning will make sure that other terms that become relatively more important in the end game get a higher weight in the end game than terms that are less important.

However, fixing the value of the pawn comes at a price: it is not guaranteed that the probability of winning the game given a +1 evaluation (say) is the same in the middle-game as it is in the end game. In fact, this is very probably not true (and I'm not even thinking about pathological cases where the extra pawn is meaningless; those you should detect and score as nearly equal anyway), but this is a tacit assumption in the tuning method described by Peter.

Instead, one should keep the probability P(win; eval) constant for a given evaluation score across different game phases. But if we also fix the value of the pawn in the end game, the problem is overdetermined (there is no free parameter to play with). Conclusion: the end game value of the pawn has to be allowed to vary along with the other weights in order for this tuning method to work properly.

Does that sound right? It does explain why I got nothing sensible when I tried to apply it to just positional weights (but not piece values).

tpetzke · Post by **tpetzke** » Thu Mar 13, 2014 12:34 pm

Hi Evert,

I guess you're right.

My reasoning was that the other endgame related terms in the evaluation can modify the pawn material value enough if required.

A simple case would be a constant to increase/decrease the PSQ values of the pawn. But as there are more pawn related terms there are multiple ways to do that.

However I'm is not sure whether it can really be compensated. Maybe I rerun the test later.

Thomas...

Evert · Post by **Evert** » Thu Mar 13, 2014 12:56 pm

tpetzke wrote: My reasoning was that the other endgame related terms in the evaluation can modify the pawn material value enough if required.

A simple case would be a constant to increase/decrease the PSQ values of the pawn. But as there are more pawn related terms there are multiple ways to do that.

That was my initial thought too: the value of the pawn is just an arbitrary constant, so what does it matter? The other terms just scale to get the correct relative score. But then it occurred to me that the mapping of evaluation score to winning chances is not necessarily constant and could be messed up by fixing the value of the pawn.

I guess one could test if this is true by plotting the winning rate for a given evaluation advantage as a function of game phase (so a 3-D plot). Ideally you'd probably want that to be independent of game phase.

Note that things get messy again when you consider that the communication protocol states that the printed score is supposed to be in "centi-pawns", which does more-or-less imply fixing the (typical) value of the pawn at 100, for both middle and end game. I guess most people ignore this anyway.

Steve Maughan · Post by **Steve Maughan** » Thu Mar 13, 2014 1:44 pm

Hi Evert,

Evert wrote:(...) But then it occurred to me that the mapping of evaluation score to winning chances is not necessarily constant and could be messed up by fixing the value of the pawn.(...)

Exactly!

This effect may turn out to be cosmetic. But if the pawn's value is significantly different in the opening to the endgame (see Marco Belli's post earlier in the thread http://goo.gl/au9O39) then it could be interesting. I suspect a swing of 100% could easily account for the 20 ELO difference noted by Thomas.

Also, an engine with such a low pawn value in the opening is going to have an "interesting" playing style with lots of pawn sacrifices in return for active play.

Steve

petero2 · Post by **petero2** » Thu Mar 13, 2014 10:25 pm

Evert wrote:However, fixing the value of the pawn comes at a price: it is not guaranteed that the probability of winning the game given a +1 evaluation (say) is the same in the middle-game as it is in the end game. In fact, this is very probably not true (and I'm not even thinking about pathological cases where the extra pawn is meaningless; those you should detect and score as nearly equal anyway), but this is a tacit assumption in the tuning method described by Peter.

I don't think there is any such assumption in my method. My method only fixates K, not the value of any evaluation weight (such as the nominal value of a pawn). Also my method optimizes on the scores returned by the q-search function, which is not a vector-value. The interpolation between MG and EG scores has to happen before search sees the score, otherwise alpha-beta would not work.

In fact my method assumes that there is not a perfect match between q-search scores and win probability. The goal of the optimization is to adjust the weights to make the mis-match smaller.

Also note that if the evaluation function internally uses (MG,EG) pairs for its weights, such weights would correspond to two parameters in the optimization problem. You could also use additional parameters to describe the weighting formula. For example, in texel many evaluation terms are weighted like this:

Code: Select all

S = S_mg  if material >= param1
    S_eg  if material <= param2
    S_mg + (S_eg - S_mg) * (param1 - material) / (param1 - param2) otherwise

The parameters "param1" and "param2" are also included in the optimization problem.

Evert wrote:Does that sound right? It does explain why I got nothing sensible when I tried to apply it to just positional weights (but not piece values).

From your post I get the impression that you are not really using the method I described, but instead a variation of it. In my method the q-search score is used to compute E, the set of training positions include non-quiet positions and positions where the evaluation score is very large, and no evaluation weights are fixed to a particular value (such as pawn=100).

It is possible to optimize only a subset of the evaluation parameters. However as I wrote in the description of my method, one big objection to this method is that it assumes that "correlation implies causation". This is also a possible reason you didn't see any improvement. Nevertheless, I have so far increased the strength of texel about 150 elo points in 2.5 months using this method. It is quite likely that the method will stop working at some point though, probably long before texel gets close to stockfish strength.

How Do You Automatically Tune Your Evaluation Tables

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm

Re: The texel evaluation function optimization algorithm