Tapered Evaluation and MSE (Texel Tuning)

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
hgm
Posts: 25887
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

Re: Tapered Evaluation and MSE (Texel Tuning)

Post by hgm » Thu Jan 21, 2021 3:31 pm

Not sure what would be better, or why you would expect any different results from that at all. Better would be to link to Mersenne Twister, but for this application that seems overdoing it.

In contrast to what you say this is actually Texel tuning as described in the original posting: use any optimalization algorithm of your choice to converge on the closest minimum. Which for simple linear evaluations like this will of course always be the global minimum, or indistinguishable close to it.

There are of course no real 'conclusions': just empirical confirmation of the well-known mathematical facts that were mentioned before. Such as that a good distribution over the entire space of possible positions ensures benign behavior of the MSE, which makes it easy to converge to a unique optimum. And that this recovers the original parameters reasonably well, although they can be adapted to partly compensate for the terms you cannot fit with the evaluation you are tuning.

Tuning on random positions would definitely be a way to ensure that. Or making sure sufficiently many random positions are included in the test set. Note I already proposed before to generate games from opening lines played by random movers.

Also note that in this case the test set is always the same, as I don't initialize the seed of the PRNG from the time. Obviously, for small test sets (and 10k positions is not that large) you should expect minor effects of the actual set used. In the limit of an infinitetely large set such effects would disappear.

A more subtle observation is this: the quantization of the parameters can interfere with convergence to an actual minimum in a simple optimizer that only probes adjacent points of the integer lattice. E.g. if the MSE would behave locally as C + (10*dP - dQ)^2 - 0.001*dQ, you would obviously improve by increasing Q (i.e. dQ > 0), while keeping 10*dP - dQ at 0 through a compensating change in P. But if dQ has to be +/-1, then that compensating change would have to be -/+0.1, which the quantization would not allow. You would have to increas Q by 10 in order to improve. The problem is not a local minimum, but unfortunate sampling of the MSE. If the parameters could have been tuned continuously (i.e. had been floats), you would not have this problem at all, and happily decread the MSE through steps (dQ = 1, dP = -0.1) for as long as it takes. So life for the optimizer is easier when you make the parameters floats, and generate new trials by a homogeneously distributed fractional perturbation between -1 and 1.

Post Reply