Evaluation & Tuning in Chess Engines

AndrewGrant · Post by **AndrewGrant** » Mon Aug 24, 2020 2:58 am

For a while now I've been thinking about taking "Texel Tuning" to the next step. For a while I've been using a naive Gradient Decent framework to tune Ethereal's evaluation terms. But this only worked for Linear terms -- not King Safety or Complexity, as their derivatives are far more complex to compute. Additionally, I was hand waving an error between adjusting a terms endgame weight and having that be offset by Complexity.

So I finally bit the bullet, did the math, and implemented things. Since doing so I've pushed 3 tuning elo gainers to Ethereal. The first tuned all Linear terms in the evaluation for ~+10.0 elo. The second demonstrated that King Safety could be tuned for elo with a ~+3.4 elo patch. Finally, as a final proof of concept, I tuned all parameters at once -- Linear, Safety, & Complexity -- and found a gain of ~+2.3 elo.

This document has 4 sections. The first of which outlines how evaluation functions tend to work in modern alpha-beta chess engines. The second section explains the original Texel Tuning ideas, and how I have refined the original ideas and dataset generation. The third section goes through and actually finds the formulas needed to compute Gradients for the entire evaluation. The fourth section shares snippets of code from Ethereal's tuner, to give a general idea of how the system is implemented.

It is still a work in progress. Probably lots of typos, missing words, whatever. I do think the maths have been checked however, which is confirmed by the success of the tuning scheme.

TL;DR: AdaGrad implementation which fully computes Gradient's for evaluation functions which are 1) Linear, or 2) Can be described as a set of Linear operations which are later adjusted by a partially-differentiable function. This method could be applied to most the open source engines I am familiar with. Also, a new way to build data sets for tuning.

cucumber · Post by **cucumber** » Mon Aug 24, 2020 5:33 am

This is very cool, readable, and nicely typeset! Congrats on the progress you've made so far!

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Aug 24, 2020 7:56 am

Thank you. Another breakthrough in understanding evaluation tuning.

xr_a_y · Post by **xr_a_y** » Mon Aug 24, 2020 10:47 am

In Minic I'm using a much more lazy approach. I always compute the derivative numerically moving each parameters by a little delta.
This is of course very slow compared to what you do but works for everything.
Another approach can be to use method that does not need any derivatives (like PSO for instance).

hgm · Post by **hgm** » Mon Aug 24, 2020 11:53 am

For non-linear optimization problems I have always used the method of randomly perturbing the best solution I found so far, biased in the direction that gave me successful steps in the recent past. Every succesfull step adds to the bias, and increases the random perturbations around it, every rejected step shrinks both the bias and the size of the perturbations.

chrisw · Post by **chrisw** » Mon Aug 24, 2020 2:28 pm

hgm wrote: ↑Mon Aug 24, 2020 11:53 am For non-linear optimization problems I have always used the method of randomly perturbing the best solution I found so far, biased in the direction that gave me successful steps in the recent past. Every succesfull step adds to the bias, and increases the random perturbations around it, every rejected step shrinks both the bias and the size of the perturbations.

Back in 2000, having no initial idea of gradient descent coding, I tried something similar to train a two hidden layer NN Backgammon. Perturb weights, play game, if new weights win keep, else junk. Loop. Had no way of objectively testing against opposition, but it did improve and the 250,000 game version used to win against me and against a web based Backgammon.

I tried something vaguely similar for chess engine search. Maybe you have also tried? I had a list of about 50 search parameters and first shot was to play several tens of thousands of bullet games, each game on a different random parameter set. And built histograms for each parameter of winrate against parameter value. It was difficult to pick out any pattern.
Next stage (too many things to do) was the random perturbation, play game (batch), keep if wins, junk if not. I guess this will require more resources than I have, one needs to play reasonably deep games else the search is not being fully tested, and deeper games require more time. On the plus side, for eval tuning, you need a lot of positions in part because some eval features only figure occasionally, whereas for search tuning, a parameter change is almost certainly going to be game decisive, so maybe fewer games are needed to hone in on a result.
Seems likely others have already tried similar?

D Sceviour · Post by **D Sceviour** » Mon Aug 24, 2020 8:37 pm

Congradulations, and thank you for the PDF on non-linear tuning. The PST's for the NN methods are using an enlarged database. Has anybody tried to use a larger PST database such as:

PST [piece][square][enemy king position]

This could adapt quite well to a tuning method.

No4b · Post by **No4b** » Mon Aug 24, 2020 11:29 pm

I did not read a lot of AI and mathematics papers and I have really ~0 knowedlge in both topics, but I must say that this article is the most readable one i encountered so far.
I wish all papers on this topics would be written as simple and clear as this one.

towforce · Post by **towforce** » Tue Aug 25, 2020 12:10 am

I apologise if this comes across as critical, but I'd like to share a few quick thoughts on chess tuning.

If I denote C as a set of constants, and T as a set of position attributes to tune, and if, for the sake of writing a concise post I may be permitted to oversimplify, then, roughly speaking, the result of the tuning will be:

eval = C1*T1 + C2*T2 + C3*T3 + ... + Cn*Tn

Some issues arise:

* the constants are not going to be constant between position types (my remedy would be a polynomial relationship which would allow powers of the position attributes to multiply with each other. This would give rise to the problem of an infinite number of possible solutions, to which my remedy would be to define some rules for simplicity and then optimise for the simplest polynomial fit)

* most methods of optimising the eval expression are going to find local maxima rather than a global maximum. My remedy would be to use one of these big linear number crunchers like the type of software used to test supercomputer flops in "real world software" to find the global maximum

* it is very likely that there are going to be a large number of best-fit solutions - but common sense would tell us that there should only be one solution

* for it to be even possible to find an optimal solution, it is a mathematical certainty that the number of positions for which an evaluation exists would need to be greater than the number of position attributes to tune

* if exact numbers are used for the sample evaluations, it is very likely to be impossible to make a fit due to two (or more) expressions being incompatible. To make it work, I think the samples would need to use evaluation ranges rather than exact numbers

jdart · Post by **jdart** » Tue Aug 25, 2020 5:41 am

Thanks, will look at this. Arasan has tuned non-linear parameters for some time now, especially king safety terms. The tuning code explicitly calculates the gradient for these.

However, I wouldn't really recommend my code as a good model for how to do this. It is complex because the tuning parameters are maintained in a separate structure, and then these are transformed into the actual evaluation parameters read by the eval code during tuning. Then, at the end of tuning the tuner generates a code file (C++) that has the eval parameters written out as constants. There is probably a simpler way to do this and a way that doesn't spread the whole tuning apparatus across and into multiple files.

Btw. I have been recently looking at this algorithm (AdaBound) for optimization: https://www.luolc.com/publications/adabound/ but I haven't gotten it working yet.

--Jon

Evaluation & Tuning in Chess Engines

Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines

Re: Evaluation & Tuning in Chess Engines