Experiments in generating Texel Tuning data

j.t. · Post by **j.t.** » Wed Jul 13, 2022 10:33 pm

I already mentioned it in another thread, but I got nice results with a new data set (probably around +30 Elo, but I didn't measure exactly).

Before I used a combination of the quiet Zurichess set (~700,000) and a very similarly produced quiet Nalwald set (~1,800,000 but weighted only 60%).

The new part is that I now also added 1,600,000 randomly selected CCRL4040 games (above 2700 Elo, at least one legal move, only quiet positions, no early opening positions). From each of these I let Nalwald play a game (80 ms per move) and stored the result. The final result of a position is now calculated as the average between the CCRL results (including multiple appearances of positions) and the Nalwald game. This set is also weighted only at 60%.

I have to mention though that my evaluation has roughly 70,000 parameters, so it may profit more from additional positions than smaller evaluations.

j.t. · Post by **j.t.** » Wed Jul 13, 2022 11:48 pm

j.t. wrote: ↑Wed Jul 13, 2022 10:33 pm ... The new part is that I now also added 1,600,000 randomly selected CCRL4040 games ...

EDIT: I mean 1,600,000 positions.

algerbrex · Post by **algerbrex** » Thu Jul 14, 2022 12:22 am

j.t. wrote: ↑Wed Jul 13, 2022 10:33 pm I have to mention though that my evaluation has roughly 70,000 parameters, so it may profit more from additional positions than smaller evaluations.

Is that a typo

Why so many parameters? Are you doing something like having different PSQT for things like the king being on a different square? I think I remember reading about that somewhere in this forum, it may have been you.

j.t. · Post by **j.t.** » Thu Jul 14, 2022 12:45 am

algerbrex wrote: ↑Thu Jul 14, 2022 12:22 am
j.t. wrote: ↑Wed Jul 13, 2022 10:33 pm I have to mention though that my evaluation has roughly 70,000 parameters, so it may profit more from additional positions than smaller evaluations.
Is that a typo

Why so many parameters? Are you doing something like having different PSQT for things like the king being on a different square? I think I remember reading about that somewhere in this forum, it may have been you.

Yes, different PSQTs depending on where the own king is, and another set of PSQTs depending on where the enemy king is. Additionally, I have a 3^9 parameters for describing pawn structures on 3x3 areas (but most of these are optimized to 0, so I may have to take a look if I can reduce the size of it). Since I added gradient descent, I always have the urge to try out new big tables that describe something on the chess board and let the tuner figure out the rest.

Patrice Duhamel · Post by **Patrice Duhamel** » Thu Jul 14, 2022 3:49 pm

algerbrex wrote: ↑Wed Jul 13, 2022 2:32 pm What does your king safety scheme look like? For Blunder, right now, it's very simple, just collecting points from features such as semi-open files around our king, and how many squares different pieces attack around the "king-zone" (16 squares around the king), and then running those points through a quadratic model to get a centi-pawn value.

For king safety I'm using an exponential lookup table, and the index is based only on the number of attacks on the king zone scaled for each types of pieces.
I want to rewrite it.

algerbrex · Post by **algerbrex** » Fri Jul 15, 2022 12:09 am

j.t. wrote: ↑Thu Jul 14, 2022 12:45 am
algerbrex wrote: ↑Thu Jul 14, 2022 12:22 am
j.t. wrote: ↑Wed Jul 13, 2022 10:33 pm I have to mention though that my evaluation has roughly 70,000 parameters, so it may profit more from additional positions than smaller evaluations.
Is that a typo

Why so many parameters? Are you doing something like having different PSQT for things like the king being on a different square? I think I remember reading about that somewhere in this forum, it may have been you.
Yes, different PSQTs depending on where the own king is, and another set of PSQTs depending on where the enemy king is. Additionally, I have a 3^9 parameters for describing pawn structures on 3x3 areas (but most of these are optimized to 0, so I may have to take a look if I can reduce the size of it). Since I added gradient descent, I always have the urge to try out new big tables that describe something on the chess board and let the tuner figure out the rest.

Ah, I see. Makes sense. Gradient descent has allowed me to experiment a lot more with the evaluation than ever before. Now that I know I can pop in a new evaluation term, spend a couple of minutes tuning it, and then see if it's an Elo gain, instead of having to tune everything for hours, has made improving the eval much easier.

I'm curious, did you see a significant Elo gain with the king-centric PSQT? I'd imagine that'd be a good idea in practice, since the position of the king is probably one of the most important evaluation factors, but I'm not sure how feasible that'd be to encode in PSQTs.

algerbrex · Post by **algerbrex** » Fri Jul 15, 2022 12:10 am

Patrice Duhamel wrote: ↑Thu Jul 14, 2022 3:49 pm
algerbrex wrote: ↑Wed Jul 13, 2022 2:32 pm What does your king safety scheme look like? For Blunder, right now, it's very simple, just collecting points from features such as semi-open files around our king, and how many squares different pieces attack around the "king-zone" (16 squares around the king), and then running those points through a quadratic model to get a centi-pawn value.
For king safety I'm using an exponential lookup table, and the index is based only on the number of attacks on the king zone scaled for each types of pieces.
I want to rewrite it.

Ah I see.

I was using the same model in Blunder 7.6.0, before I switched to using a quadratic model so I could more easily take the gradient of the evaluation model. Probably not as good as using a well-tuned exponential table, but it gave ~50 Elo on the first try, so I can't complain too much.

What new idea did you have in mind for a re-write?

j.t. · Post by **j.t.** » Fri Jul 15, 2022 12:38 am

algerbrex wrote: ↑Fri Jul 15, 2022 12:09 am I'm curious, did you see a significant Elo gain with the king-centric PSQT? I'd imagine that'd be a good idea in practice, since the position of the king is probably one of the most important evaluation factors, but I'm not sure how feasible that'd be to encode in PSQTs.

I tested it yesterday against a small pool of engines, and at 15s+0.3s it was ~100 Elo worse when I used only two PSQTs (one for opening and one for endgame). But the real difference is probably a bit smaller, since everything I did since I added king contextual PSQTs was optimizing for having these instead of normal PSQTs.

A king contextual PSQT for one phase in Nalwald is done like this:

Code: Select all

pst: array[ourKing..enemyKing, array[a1..h8, array[pawn..noPiece, array[a1..h8, ValueType]]]] # noPiece for passed pawns

The trick is, to calculate the gradient mainly for the given king positions, but also a little bit for all other king positions, which works out better for learning.

Also, because of the huge number of parameters, the tuning process is much slower than with only a few hundred, possibly because of memory constraints (cache and stuff like that). Could also be because my implementation is not so great, but at least I haven't found a good way to make it faster, so even when using multiple threads it takes a few hours.

algerbrex · Post by **algerbrex** » Fri Jul 15, 2022 3:03 am

j.t. wrote: ↑Fri Jul 15, 2022 12:38 am I tested it yesterday against a small pool of engines, and at 15s+0.3s it was ~100 Elo worse when I used only two PSQTs (one for opening and one for endgame). But the real difference is probably a bit smaller, since everything I did since I added king contextual PSQTs was optimizing for having these instead of normal PSQTs.

Right, make sense. I figured at this point, they've been integrated so heavily that it'd be severely bad to just rip them out of the eval.

j.t. wrote: ↑Fri Jul 15, 2022 12:38 am A king contextual PSQT for one phase in Nalwald is done like this:
Code: Select all
pst: array[ourKing..enemyKing, array[a1..h8, array[pawn..noPiece, array[a1..h8, ValueType]]]] # noPiece for passed pawns

I'm not super familiar with Nim syntax, but you have a table that looks something like pst[2][64][64], correct? So you index the table like pst[kingColor][kingSq][pieceSq]? Interesting, I was wondering how you went about modeling that. I like that idea a lot, and the gradient descent
tuner makes it feasible versus naive texel tuning, where I wouldn't even want to think about tuning so many parameters unless I had access to many cores.

j.t. wrote: ↑Fri Jul 15, 2022 12:38 am The trick is, to calculate the gradient mainly for the given king positions, but also a little bit for all other king positions, which works out better for learning.

I hadn't considered what you're evaluation function model looked like, but I think that makes sense to me. Out of curiosity, do you also texelly tune king safety as well? What model did you end up using that was relatively easily derivable. I ended up settling on a quadratic model for a first attempt, since the derivative of simple of course, and it did surprisingly well. Next up is to find an exponential model to better capture the idea of king-safety.

j.t. wrote: ↑Fri Jul 15, 2022 12:38 am Also, because of the huge number of parameters, the tuning process is much slower than with only a few hundred, possibly because of memory constraints (cache and stuff like that). Could also be because my implementation is not so great, but at least I haven't found a good way to make it faster, so even when using multiple threads it takes a few hours.

Interestingly, I found with my gradient descent tuner I got the best results switching to AdaGrad, and then running several thousand iterations. The current evaluation parameters in Blunder were tuned from scratch using 50K iterations and 1M positions from my extended dataset. Took about four hours, much better than the eleven it took to tune the original evaluation parameters in Blunder 7.6.0, on only 400K positions.

Patrice Duhamel · Post by **Patrice Duhamel** » Fri Jul 15, 2022 11:49 am

algerbrex wrote: ↑Fri Jul 15, 2022 12:10 am What new idea did you have in mind for a re-write?

I don't know, the same base idea using more parameters, and maybe trying tropism again ?

If I can tune this with texel's tuning method, it will be easier to see what works best, but it takes time.

Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data

Re: Experiments in generating Texel Tuning data