tuning for the uninformed

CheckersGuy · Post by **CheckersGuy** » Tue Nov 28, 2017 8:23 pm

I do have another question about how you guyz deal with that problem.
How do you generate a lot of games which aren't too smiliar ??? Or to phrase it differently, how do you generator a lot of unique positions ?
Do you use a very large opening book as a starting point ?

Greetings

Robin

AlvaroBegue · Post by **AlvaroBegue** » Tue Nov 28, 2017 8:37 pm

CheckersGuy wrote:I do have another question about how you guyz deal with that problem.
How do you generate a lot of games which aren't too smiliar ??? Or to phrase it differently, how do you generator a lot of unique positions ?
Do you use a very large opening book as a starting point ?

Greetings

Robin

I downloaded games from CCRL, took positions from those games and analyzed them with my program RuyDos. I saved positions on which the evaluation function was being called after searching 1000 nodes. I then labelled each position by running one very quick SF8-vs-SF8 game.

https://bitbucket.org/alonamaloh/ruy_tu ... th_results

EDIT: In that file each position has been replaced by the position from which quiescence search got its score.

zenpawn · Post by **zenpawn** » Sat Dec 02, 2017 11:50 am

CheckersGuy wrote: This is the local search algorithm but I would assume that it is better to run some gradient based algorithm first (Maybe gradient descent or gauss-newton). Then if the error doesn't change by much anymore I would switch to local search.

Could you describe these pre-localization algorithms for a non-mathematician?

Also, suppose a file of positions labeled with evals from an oracle engine instead of game results. Is there a way to convert the error formula used in Texel tuning to take an eval instead of a game result? Perhaps sigmoid of that minus the tuning engine's sigmoid instead of result - sigmoid?

jdart · Post by **jdart** » Sat Dec 02, 2017 3:54 pm

Texel tuning is basically a form of logistic regression. There is a large literature on this, and a lot of explanatory material online. As initially described it is slightly unorthodox because it is using a squared-error loss function to handle the 3 categories of win, loss, draw. But the general principle is still the same.

--Jon

jdart · Post by **jdart** » Sat Dec 02, 2017 3:58 pm

Henk wrote:All does not work if search space has a great many local optima and only very few global optima that you are interested in. But simulated annealing taking too long.

If the loss function is convex, and it generally is, then you are going to be able to converge to a minimum using local search, or gradient descent.

Nobody uses simulated annealing. There are much better algorithms.

For large training sets IMO you want to use a gradient descent method such as ADAM.

--Jon

jdart · Post by **jdart** » Sat Dec 02, 2017 4:02 pm

Proof of strict convexity for logistic loss function:

http://qwone.com/~jason/writing/convexLR.pdf

--Jon

zenpawn · Post by **zenpawn** » Sat Dec 02, 2017 6:14 pm

jdart wrote:Texel tuning is basically a form of logistic regression. There is a large literature on this, and a lot of explanatory material online. As initially described it is slightly unorthodox because it is using a squared-error loss function to handle the 3 categories of win, loss, draw. But the general principle is still the same.

--Jon

If this was in response to my post directly above it, I have Texel tuning implemented using the local optimization function given as pseudo-code on the CPW, but it sounds like there are ways to get closer first via "gradient descent or gauss-newton". Unfortunately, what I've read of these so far was over my head.

Regarding the 3-categories (win, loss, draw) being unusual, does that mean using an oracle eval would actually be more appropriate? Would it just plugin where the result (Ri) goes or does it have to be converted to a -1 to 1 range first (using the sigmoid function?)?

jdart · Post by **jdart** » Sat Dec 02, 2017 6:56 pm

The theory of logistic regression with discrete outcomes (dependent variables) deals mainly with three cases:
1. binary (0 or 1) outcomes
2. ordinal outcomes (such as poor, average, good).
3. discrete but unordered outcomes.

None of these commonly use a squared error distance to measure goodness of fit, as the Texel method does.

However, our problem is basically 2, but with a twist: there are three possible values for a game (0, 0.5 or 1), but the values are meaningful in the sense that 0.5 is really equidistant between 0 and 1. So in that case the Texel method may be a reasonable approach, it is just not as theoretically grounded.

As for using the eval as an oracle: yes, you can do this. Your label is then the oracle's value and you regress to find the best match between predicted and the oracle value. All this changes in the whole procedure is the "loss function" that measures goodness of fit. You could use mean absolute or squared difference between predicted and oracle eval, for example.

Yet another approach is to make predicted moves based on your eval match actually played moves by an "oracle" such as a strong program or a strong human player. This approach has been used in Shogi - there is a paper by Hoki and Kaneko.

But realize that then you are really solving a different problem. The regression will make your eval conform to the oracle's eval. In effect, this is a fancy way of reverse engineering the oracle's eval. It may or may not make your eval predict game results better, or actually play better, but if the oracle is much stronger then it likely will cause improvement.

--Jon

jdart · Post by **jdart** » Sat Dec 02, 2017 9:09 pm

It is has occurred me that the correct model here might be Poisson regression (https://en.wikipedia.org/wiki/Poisson_regression). If we ran for each position a match of n games, and if the probability of a win is some value p, then the distribution of game results (x2) would be integers that could be modeled as a Poisson distribution and the parameters could be tuned using that method to model the outcomes. I don't know if this is completely valid but it seems plausible. Texel tuning as initially described is the special case of n=1.

--Jon

ZirconiumX · Post by **ZirconiumX** » Sun Dec 03, 2017 11:09 am

jdart wrote:It is has occurred me that the correct model here might be Poisson regression (https://en.wikipedia.org/wiki/Poisson_regression). If we ran for each position a match of n games, and if the probability of a win is some value p, then the distribution of game results (x2) would be integers that could be modeled as a Poisson distribution and the parameters could be tuned using that method to model the outcomes. I don't know if this is completely valid but it seems plausible. Texel tuning as initially described is the special case of n=1.

--Jon

This is only mildly related to the topic, but Jon, I think you said that Arasan used a closed-form equation for the derivatives of your eval. Would you mind going over how you calculated those? It confuses me a lot.

For example, even if I take a stupid material only tapered eval with mean-squared error, I get this:

Code: Select all

Let count_p_w, count_n_w etc be the count of pawns, knights etc for white.
Let count_p_b, count_n_b etc be the count of pawns, knights etc for black.
Let phase_p, phase_n etc be the phase weight of pawns, knights, etc for tapered eval.
Let value_p_o, value_n_o etc be the material value of pawns, knights, etc in the opening.
Let value_p_e, value_n_e etc be the material value of pawns, knights, etc in the endgame

value_o = &#40;count_p_w - count_p_b&#41; * value_p_o + &#40;count_n_w - count_n_b&#41; * value_n_o + ...
value_e = &#40;count_p_w - count_p_b&#41; * value_p_e + &#40;count_n_w - count_n_b&#41; * value_n_e + ...
total_phase = 16 * phase_p + 4 * phase_n + ...
phase = &#40;count_p_w + count_p_b&#41; * phase_p + &#40;count_n_w + count_n_b&#41; * phase_n + ...
value = (&#40;phase * value_o&#41; + (&#40;total_phase - phase&#41; * value_e&#41;) / total_phase
sigmoid = 1 / &#40;1 + 10 ** (-K*value&#41;)
error = &#40;result - sigmoid&#41; * &#40;result - sigmoid&#41;

And I have no idea how to then differentiate that, because that's a lot of variables.

tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed

Re: tuning for the uninformed - Poisson?

Re: tuning for the uninformed - Poisson?