Help with Texel's tuning

maksimKorzh · Post by **maksimKorzh** » Thu Jan 07, 2021 6:00 pm

mvanthoor wrote: ↑Thu Jan 07, 2021 12:41 pm Congratulations

Where did you find the explanation with regard to Texel tuning, and the millions of games to test against? Or, did you get all the explanation needed by PM and created the test data yourself by (for example) scraping lichess games?

I'd be _very_ much interested in a video about this topic, because I feel that evaluation function tuning is something I just can't get my head around properly. Maybe I'm just missing some details. Writing a Texel tuner (for me) seems infinitely more complex than understanding and writing a magic bitboard engine.

I'd love to implement Texel tuning as one of the first features after the transposition table, so I can use it right from the beginning, even when only using PST's. I'm wondering how much Texel tuning could improve the PST's. (It'll probably compensate for a huge lack of knowledge, at least in the beginning. For example, having the rook on the 7th line is now in the PST, instead of being a discrete parameter.)

Marcel, I've just made a video on Texel's tuning:
Really interested to get a feedback from you)

Pio · Post by **Pio** » Thu Jan 07, 2021 6:27 pm

maksimKorzh wrote: ↑Thu Jan 07, 2021 3:57 pm
Pio wrote: ↑Thu Jan 07, 2021 3:40 pm
BrianNeal wrote: ↑Thu Jan 07, 2021 3:16 pm Would shuffling the parameters before each iteration make sense (besides shuffling the training positions)?
It shouldn’t matter much but I think it will be a little bit worse. Doing my variable delta trick for each feature will speed up convergence and so will doing PCA without any reduction since it will orthogonalize the problem reducing the dependencies between different features.
I can't believe the fact that I've understood what you've just said... amazing)
Ronald's explanation made me a smart monkey)

Hi Maksim!

Doing the PCA trick is a little bit tricky since you are not using floating point arithmetics. You can do it but you will have to save some rest delta for those features that that were too small in the PCA representation to make it so that you later can add the rest to see if it will be big enough to get updated...

I have seen that lots of people are trying to minimise the squared error. I believe it might be better to minimise the absolute error. One advantage of minimising the absolute error is that the objective function won’t get as sensitive in labelling errors so it is more robust. It would be really interesting to see what gives you the best result. The version minimising the squared errors or the version minimising the absolute errors.

maksimKorzh · Post by **maksimKorzh** » Thu Jan 07, 2021 6:49 pm

Pio wrote: ↑Thu Jan 07, 2021 6:27 pm
maksimKorzh wrote: ↑Thu Jan 07, 2021 3:57 pm
Pio wrote: ↑Thu Jan 07, 2021 3:40 pm
BrianNeal wrote: ↑Thu Jan 07, 2021 3:16 pm Would shuffling the parameters before each iteration make sense (besides shuffling the training positions)?
It shouldn’t matter much but I think it will be a little bit worse. Doing my variable delta trick for each feature will speed up convergence and so will doing PCA without any reduction since it will orthogonalize the problem reducing the dependencies between different features.
I can't believe the fact that I've understood what you've just said... amazing)
Ronald's explanation made me a smart monkey)
Hi Maksim!

Doing the PCA trick is a little bit tricky since you are not using floating point arithmetics. You can do it but you will have to save some rest delta for those features that that were too small in the PCA representation to make it so that you later can add the rest to see if it will be big enough to get updated...

I have seen that lots of people are trying to minimise the squared error. I believe it might be better to minimise the absolute error. One advantage of minimising the absolute error is that the objective function won’t get as sensitive in labelling errors so it is more robust. It would be really interesting to see what gives you the best result. The version minimising the squared errors or the version minimising the absolute errors.

I just feel like I need to play around with what I have so far just to get used this totally new stuff for me.
I also have a couple of weird ideas based on this method I'd like to try in wukong.
I'll be happy to report the results if what is in my head can work.

mvanthoor · Post by **mvanthoor** » Thu Jan 07, 2021 9:35 pm

maksimKorzh wrote: ↑Thu Jan 07, 2021 6:00 pm Marcel, I've just made a video on Texel's tuning:
Really interested to get a feedback from you)

That's awesome

I'll have a look at it in the weekend.

It seems you got the (private) explanation you needed to get this done; I "know" how Texel tuning works, but detailed knowledge is sketchy.

It was also sketchy with regard to magic bitboards, and still is with search function optimization, but there's A LOT of information out there, which enabled me to piece everything together; I'm still missing a few 'dots on the i' to be able to implement Texel tuning, and I really hope your video has some of the dots in it

After the transposition table, this will probably be the first thing I'm going to implement, because I already hate changing the PST's

My program has my own PST's, but they have been created according to my own chess knowledge. This creates an ironic result. In a reasonable time frame of about 10 seconds per move, Rustic reaches depth 6-8.. about the same as I can reach myself in a rapid game. Because Rustic's PST's makes it put the pieces where *I* would put them, and our search depths are similar, playing this initial version of Rustic eerily feels like playing a game against myself...

I'm still "stronger", because there are still things I know and Rustic doesn't (bishop pair, positional knowledge vs. material advantage, etc). Therefore I can still 'trick' Rustic to do stupid things, with old anti-computer chess tactics and strategies.

mvanthoor · Post by **mvanthoor** » Thu Jan 07, 2021 9:40 pm

fabianVDW wrote: ↑Thu Jan 07, 2021 5:48 pm Depending on your mathematical education, you can read https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf. I recommend atleast some basic knowledge about multivariate calculus. Then it is easy. For more questions on the contents of this, you can also ask Andrew directly I suppose or me here.

Thanks

I'll try and have a look at it.

I have a passing knowledge of mathematics; what you would learn in high school, and the first 2 years of uni when studying computer science. I know the basics of several maths fields, but I'm not a mathematician. Some of that math hasn't been used for so long that I'll probably have to look it up (think, for example, combinatorial calculations, etc).

maksimKorzh · Post by **maksimKorzh** » Fri Jan 08, 2021 2:32 am

mvanthoor wrote: ↑Thu Jan 07, 2021 9:35 pm
maksimKorzh wrote: ↑Thu Jan 07, 2021 6:00 pm Marcel, I've just made a video on Texel's tuning:
Really interested to get a feedback from you)
That's awesome I'll have a look at it in the weekend.

It seems you got the (private) explanation you needed to get this done; I "know" how Texel tuning works, but detailed knowledge is sketchy.

It was also sketchy with regard to magic bitboards, and still is with search function optimization, but there's A LOT of information out there, which enabled me to piece everything together; I'm still missing a few 'dots on the i' to be able to implement Texel tuning, and I really hope your video has some of the dots in it

After the transposition table, this will probably be the first thing I'm going to implement, because I already hate changing the PST's

My program has my own PST's, but they have been created according to my own chess knowledge. This creates an ironic result. In a reasonable time frame of about 10 seconds per move, Rustic reaches depth 6-8.. about the same as I can reach myself in a rapid game. Because Rustic's PST's makes it put the pieces where *I* would put them, and our search depths are similar, playing this initial version of Rustic eerily feels like playing a game against myself...

I'm still "stronger", because there are still things I know and Rustic doesn't (bishop pair, positional knowledge vs. material advantage, etc). Therefore I can still 'trick' Rustic to do stupid things, with old anti-computer chess tactics and strategies.

Like I've already mentioned - without a help from Ronald Friederich this wouldn't ever be possible, read my previous response to your original post.

re: My program has my own PST's, but they have been created according to my own chess knowledge
- I've tried that as well - it was always worse compared to whatever existing PSTs even simplified eval from CPW

re: In a reasonable time frame of about 10 seconds per move, Rustic reaches depth 6-8
- hmmm... it seems a bit too slow for compiled language - Wukong JS reaches 7-9 in 1 second per move and it's written in javascript
I remember you were mentioning 10-12 plies without TT or did I miss something?

Wukong JS is now maybe 10-15 Elo weaker than TSCP.
Does Rustic beat TSCP already?

mvanthoor · Post by **mvanthoor** » Fri Jan 08, 2021 10:17 am

maksimKorzh wrote: ↑Fri Jan 08, 2021 2:32 am re: In a reasonable time frame of about 10 seconds per move, Rustic reaches depth 6-8
- hmmm... it seems a bit too slow for compiled language - Wukong JS reaches 7-9 in 1 second per move and it's written in javascript
I remember you were mentioning 10-12 plies without TT or did I miss something?

It reaches 6-8 plys in the middle game. Note that Rustic doesn't have any search optimizations yet, except for MVV-LVA sorting, and it doesn't have a TT yet. In later stages of the game, it can indeed already reach depth 10-12.

Wukong JS is now maybe 10-15 Elo weaker than TSCP.
Does Rustic beat TSCP already?

Yes. If TSCP does not manage to create a passed pawn, it will lose. If TSCP can create an advanced passed pawn, it will win because Rustic can't see the danger yet (until the promotion comes within its search horizon).

I'm confident that, as soon as I include some (passed) pawn knowledge, it'll consistently beat TSCP, even without a TT or other optimizations.

I hope to get some more time to work on Rustic in the near future. (Maybe I'll just release Alpha 1 without XBoard, and include it when I finish that feature.)

AndrewGrant · Post by **AndrewGrant** » Fri Jan 08, 2021 10:46 am

mvanthoor wrote: ↑Thu Jan 07, 2021 9:40 pm
fabianVDW wrote: ↑Thu Jan 07, 2021 5:48 pm Depending on your mathematical education, you can read https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf. I recommend atleast some basic knowledge about multivariate calculus. Then it is easy. For more questions on the contents of this, you can also ask Andrew directly I suppose or me here.
Thanks I'll try and have a look at it.

I have a passing knowledge of mathematics; what you would learn in high school, and the first 2 years of uni when studying computer science. I know the basics of several maths fields, but I'm not a mathematician. Some of that math hasn't been used for so long that I'll probably have to look it up (think, for example, combinatorial calculations, etc).

The Math is quite light. Simple derivations, with some tricks for absolute value, min, and max, which are likely not taught in school, but are of interesting. Only weird looking thing that is not easy to convince yourself of is the Sigmoid derivation in section 3.something.

maksimKorzh · Post by **maksimKorzh** » Fri Jan 08, 2021 5:04 pm

AndrewGrant wrote: ↑Fri Jan 08, 2021 10:46 am
mvanthoor wrote: ↑Thu Jan 07, 2021 9:40 pm
fabianVDW wrote: ↑Thu Jan 07, 2021 5:48 pm Depending on your mathematical education, you can read https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf. I recommend atleast some basic knowledge about multivariate calculus. Then it is easy. For more questions on the contents of this, you can also ask Andrew directly I suppose or me here.
Thanks I'll try and have a look at it.

I have a passing knowledge of mathematics; what you would learn in high school, and the first 2 years of uni when studying computer science. I know the basics of several maths fields, but I'm not a mathematician. Some of that math hasn't been used for so long that I'll probably have to look it up (think, for example, combinatorial calculations, etc).
The Math is quite light. Simple derivations, with some tricks for absolute value, min, and max, which are likely not taught in school, but are of interesting. Only weird looking thing that is not easy to convince yourself of is the Sigmoid derivation in section 3.something.

Hi Andy

For noobs like the problem with your paper is not even math - the problem is that it assumes that developer already understands
how to establish the basic pipeline, how it works, what are the inputs and outputs, etc. Reading about optimizing evaluation tuning
without a clear understanding of it's basics simply doesn't make sense. Obviously your paper is source of brilliancy for experienced
developers. This is one of the main reasons for my own research and emphasizing the basic setup and proof of concept implementation
so that the gap in beginner's understanding is removed.

Fun fact:
I've been rereading your PDF many times but the only thing I've realized every next time is that I'm more stupid then I thought)
Lots of questions regarding basics were arising along the way that I wanted to ask...
Only after Ronald Friederich kindly explained be the basic idea of Texel's tuning, calculating mean square error in particular -
only then your work started making sense to me. After Ronald explained me your article regarding the optimizations that going
beyond using gradient decent and how you fought non linearity issues - I've realized how god-level your work is.

The most problem of truly incredible articles in terms of brilliancy and insights is the lack of explanations of basics.
Also the great problem is a focus on tiny little details meanwhile completely dropping the overall pipeline flow.
If you've added the only single paragraph to your article within Texel's tuning method explanation (simply defining the pipeline
and providing pseudo code for the phrase "then he adjusts the weights and recalculates MSE") - more people could make use
of your work, but probably you don't really care about it.

I thought the problem was is my stupidity, but it turned out to be simply in a lack of basic knowledge.
The problem is that there are very very few sources of basic knowledge - maybe only a couple of engines.

jdart · Post by **jdart** » Fri Jan 08, 2021 5:52 pm

Texel tuning is basically supervised learning using logistic regression. There is a very large literature on this, outside of the field of chess.

For just one example: https://web.stanford.edu/~jurafsky/slp3/5.pdf

Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning

Re: Help with Texel's tuning