Experiments in generating Texel Tuning data

j.t. · Post by **j.t.** » Fri Jul 15, 2022 12:54 pm

algerbrex wrote: ↑Fri Jul 15, 2022 3:03 am Right, make sense. I figured at this point, they've been integrated so heavily that it'd be severely bad to just rip them out of the eval.

I believe that at the time of first adding them (which was also the first time I tried automatic parameter tuning) it was around 50 Elo, but I am not 100% sure about that

algerbrex wrote: ↑Fri Jul 15, 2022 3:03 am I'm not super familiar with Nim syntax, but you have a table that looks something like pst[2][64][64], correct?

In C-like syntax it could look like this (again, excluding phases): pst[2][64][7][64]
The [2] is not exactly the color of the king, but it's one sub-table for the enemy king and one for our own king. The idea behind this is that the one table encodes how to position your pieces for attacking the enemy king, and one table for how to defend your own king. The first [64] is then the square of that king.
The [7] are the pieces pawn, knight, bishop, rook, queen, king, and passed pawn and the [64] then the square of that piece.

algerbrex wrote: ↑Fri Jul 15, 2022 3:03 am Out of curiosity, do you also texelly tune king safety as well? What model did you end up using that was relatively easily derivable. I ended up settling on a quadratic model for a first attempt, since the derivative of simple of course, and it did surprisingly well. Next up is to find an exponential model to better capture the idea of king-safety.

I have three eval features for king safety (besides the king contextual PSTs).
- bonus for bishop, queen, rook checking the enemy king
- bonus for bishop, queen, rook attacking the immediate area around enemy king (3x3)
- penalty for each square from which a queen would check the king when only considering our own pawn structure
- number of enemy pieces near our king (5x5 area). This turned out to be either a penalty for more pieces (during endgame) or unintuitively a bonus (during opening).

For the last two, I have simply tables for each number, e.g. when there are 3 pieces near the king, then I look into the table for this feature at index 3. And this way, I then also calculate a simple linear gradient for the respective table entry.

I am actually not 100% sure if the 2nd and the 4th point are doing anything good for my evaluation, I think the last time i tested them was together with another bunch of new eval features, and it worked, so I just went with it.

I don't have any non-linear king safety features.

algerbrex wrote: ↑Fri Jul 15, 2022 3:03 am Interestingly, I found with my gradient descent tuner I got the best results switching to AdaGrad, and then running several thousand iterations. The current evaluation parameters in Blunder were tuned from scratch using 50K iterations and 1M positions from my extended dataset. Took about four hours, much better than the eleven it took to tune the original evaluation parameters in Blunder 7.6.0, on only 400K positions.

Do you calculate the gradient over all positions each iteration? Because I do that, and I only do 200-400 iterations (though it takes also 3-6 hours with (inefficiently used) 30 threads).

algerbrex · Post by **algerbrex** » Fri Jul 15, 2022 10:58 pm

j.t. wrote: ↑Fri Jul 15, 2022 12:54 pm Do you calculate the gradient over all positions each iteration? Because I do that, and I only do 200-400 iterations (though it takes also 3-6 hours with (inefficiently used) 30 threads).

I do actually, but my full dataset as of late is only 1M positions, Zurichess + Blundef self-play games, only about a 1/4 of the positions you're using, if I read your posts correctly. So iterating over the full dataset takes maybe a second or so.

When I first started experimenting with the tuner, I tried running only a couple of hundred iterations, but I wasn't getting very good convergence. I remember I eventually started recording the error rate during the tuning sessions, and plotting the values using matplotlib. And the error rate was bouncing all over the place.

I eventually switched to AdaGrad, and got much better results. But I also remember I fixed a couple of slight bugs also in how I computed the gradient, so I can't say definitively my original tuner wouldn't have worked. AdaGrad's convergence was a good bit slower, but the error dropped consistently and smoothly, and it converged to much better values. So I've stuck with it. I should experiment with only running a couple hundred iterations instead now that I've worked (afaik) all of the bugs from the tuner.

Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data