I've recently taken a stab at writing a texel tuner, and after a bit of work, it seems to work, and it's pretty cool seeing the training set automatically updating the piece-square tables. However, I'm curious about two things right now.
The tuner seems reasonable fast and calculates the mean square error of my training set (~3.5M quiet positions from Zurichess) in about ~8-12 seconds. Still, though, I'm trying to tune 768 weights (6 middle-game piece-square tables, and 6 end-game piece-square tables, for tapered eval), and so just going by the time it takes to calculate the mean square error after tuning a single weight, one iteration through the entire set of weights would take anywhere between two to four hours. And I'm planning on adding more weights for tuning mobility bonuses and whatnot.
I'm currently just using the naive implementation for minimizing the error function found on the CPW, so I knew it would be slower. What would be some good ways to speed the algorithm up? I know of gradient descent, and while I understand the concept the exact math and implementation details are pretty fuzzy to me, especially when varients of gradient descent are brought up. I'll keep looking into this more though, and any good resources would be appreciated.
Second, from some preliminary testing, while I can see the tuning is starting to encode basic chess principles into the piece-square tables, the tables still pretty awful, and don't even complete with handcrafted tables. But I'm assuming running through the full dataset every iteration, and running several hundred iterations will significantly improve things? In addition to going through several training sessions and randomizing the epd file as well? Lastly, if it matters, I'm putting the baseline material values into piece-square tables themselves and letting the tuner run from there (e.g. the pawn middle-game table starts off with 100 for each square, each square in the middle-game table for knights starts off with 300, etc.). That way, I don't need to count material separately.
Texel tuning questions
Moderator: Ras
-
amanjpro
- Posts: 883
- Joined: Sat Mar 13, 2021 1:47 am
- Full name: Amanj Sherwany
Re: Texel tuning questions
My engine is also written in Go, I too implement the naiive algorithm (or local optimization really), you can do the following:algerbrex wrote: ↑Tue Aug 17, 2021 4:34 pm I've recently taken a stab at writing a texel tuner, and after a bit of work, it seems to work, and it's pretty cool seeing the training set automatically updating the piece-square tables. However, I'm curious about two things right now.
The tuner seems reasonable fast and calculates the mean square error of my training set (~3.5M quiet positions from Zurichess) in about ~8-12 seconds. Still, though, I'm trying to tune 768 weights (6 middle-game piece-square tables, and 6 end-game piece-square tables, for tapered eval), and so just going by the time it takes to calculate the mean square error after tuning a single weight, one iteration through the entire set of weights would take anywhere between two to four hours. And I'm planning on adding more weights for tuning mobility bonuses and whatnot.
I'm currently just using the naive implementation for minimizing the error function found on the CPW, so I knew it would be slower. What would be some good ways to speed the algorithm up? I know of gradient descent, and while I understand the concept the exact math and implementation details are pretty fuzzy to me, especially when varients of gradient descent are brought up. I'll keep looking into this more though, and any good resources would be appreciated.
Second, from some preliminary testing, while I can see the tuning is starting to encode basic chess principles into the piece-square tables, the tables still pretty awful, and don't even complete with handcrafted tables. But I'm assuming running through the full dataset every iteration, and running several hundred iterations will significantly improve things? In addition to going through several training sessions and randomizing the epd file as well? Lastly, if it matters, I'm putting the baseline material values into piece-square tables themselves and letting the tuner run from there (e.g. the pawn middle-game table starts off with 100 for each square, each square in the middle-game table for knights starts off with 300, etc.). That way, I don't need to count material separately.
- Parse all the positions once, and load them into memory
- Add bounds to values if it makes sense, for example you should probably never have passed pawn award that is negative!
- Add multi-threading (if you don't have it)
-
AndrewGrant
- Posts: 1960
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Texel tuning questions
https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf
This was my best contribution to computer chess aside from OpenBench I suppose. Ethereal's quick rise was heavily due to the speed at which new terms could be created, inserted into the tuner, and tuned to a degree of certainty such that you could discard the term or not based just on the SPRT testing.
This was my best contribution to computer chess aside from OpenBench I suppose. Ethereal's quick rise was heavily due to the speed at which new terms could be created, inserted into the tuner, and tuned to a degree of certainty such that you could discard the term or not based just on the SPRT testing.
-
algerbrex
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: Texel tuning questions
Hi amanj, thanks for the advice. I'll definitely go ahead and make the change to load all the positions into memory.amanjpro wrote: ↑Tue Aug 17, 2021 4:39 pmMy engine is also written in Go, I too implement the naiive algorithm (or local optimization really), you can do the following:algerbrex wrote: ↑Tue Aug 17, 2021 4:34 pm I've recently taken a stab at writing a texel tuner, and after a bit of work, it seems to work, and it's pretty cool seeing the training set automatically updating the piece-square tables. However, I'm curious about two things right now.
The tuner seems reasonable fast and calculates the mean square error of my training set (~3.5M quiet positions from Zurichess) in about ~8-12 seconds. Still, though, I'm trying to tune 768 weights (6 middle-game piece-square tables, and 6 end-game piece-square tables, for tapered eval), and so just going by the time it takes to calculate the mean square error after tuning a single weight, one iteration through the entire set of weights would take anywhere between two to four hours. And I'm planning on adding more weights for tuning mobility bonuses and whatnot.
I'm currently just using the naive implementation for minimizing the error function found on the CPW, so I knew it would be slower. What would be some good ways to speed the algorithm up? I know of gradient descent, and while I understand the concept the exact math and implementation details are pretty fuzzy to me, especially when varients of gradient descent are brought up. I'll keep looking into this more though, and any good resources would be appreciated.
Second, from some preliminary testing, while I can see the tuning is starting to encode basic chess principles into the piece-square tables, the tables still pretty awful, and don't even complete with handcrafted tables. But I'm assuming running through the full dataset every iteration, and running several hundred iterations will significantly improve things? In addition to going through several training sessions and randomizing the epd file as well? Lastly, if it matters, I'm putting the baseline material values into piece-square tables themselves and letting the tuner run from there (e.g. the pawn middle-game table starts off with 100 for each square, each square in the middle-game table for knights starts off with 300, etc.). That way, I don't need to count material separately.
- Parse all the positions once, and load them into memory
- Add bounds to values if it makes sense, for example you should probably never have passed pawn award that is negative!
- Add multi-threading (if you don't have it)
With regards to your last two points:
- That sounds like a good idea, and you're right, I noticed that my tuner would spend time give stupidly high or low values (Ironically enough, passers being valued too highly is my issue. I suppose the positions I sampled had passed pawns being the deciding factor, and so I have piece square tables where pawns on the 7th and 2nd rank are being valued the same as two minors!) How would I go about doing this exactly though, efficiently at least? I already have a good idea of what values should be bounded and where, but I have to think about how exactly I'd implement that efficiently. It'll probably occur to me soon enough though
- Another good idea, but I'm not sure how to go about multithreading the process either, particularly in Go. I'm aware of goroutines and whatnot, but its not yet clear to me how to piece everything together, especially in the context of texel tuning. But I'll keep doing some more reading.
-
algerbrex
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: Texel tuning questions
Thanks, Andrew, I'll definitely look through that paper, although it'll probably take me a while to grok everything.AndrewGrant wrote: ↑Tue Aug 17, 2021 4:58 pm https://github.com/AndyGrant/Ethereal/b ... Tuning.pdf
This was my best contribution to computer chess aside from OpenBench I suppose.
That makes sense. I've heard from several experience chess engine writers that testing is the key to making a strong engine, so my hope is that once I invest in some better computing, I can begin using the tuning and SPRT testing to make strong progress.AndrewGrant wrote: ↑Tue Aug 17, 2021 4:58 pm Ethereal's quick rise was heavily due to the speed at which new terms could be created, inserted into the tuner, and tuned to a degree of certainty such that you could discard the term or not based just on the SPRT testing.
-
algerbrex
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: Texel tuning questions
By the way, did you need to run through many iterations of the full training set before you started to get good piece-square tables? I'm just trying to figure out the optimal ratio of iterations to positions evaluated. My intuition is that it's better to go through the full dataset at lower iterations and visa versa. The only problem is that one run through the full trainset looks like it'll take an hour or so. But I'll go ahead and implement your other two suggestions and see what kind of speed improvement I get.amanjpro wrote: ↑Tue Aug 17, 2021 4:39 pmMy engine is also written in Go, I too implement the naiive algorithm (or local optimization really), you can do the following:algerbrex wrote: ↑Tue Aug 17, 2021 4:34 pm I've recently taken a stab at writing a texel tuner, and after a bit of work, it seems to work, and it's pretty cool seeing the training set automatically updating the piece-square tables. However, I'm curious about two things right now.
The tuner seems reasonable fast and calculates the mean square error of my training set (~3.5M quiet positions from Zurichess) in about ~8-12 seconds. Still, though, I'm trying to tune 768 weights (6 middle-game piece-square tables, and 6 end-game piece-square tables, for tapered eval), and so just going by the time it takes to calculate the mean square error after tuning a single weight, one iteration through the entire set of weights would take anywhere between two to four hours. And I'm planning on adding more weights for tuning mobility bonuses and whatnot.
I'm currently just using the naive implementation for minimizing the error function found on the CPW, so I knew it would be slower. What would be some good ways to speed the algorithm up? I know of gradient descent, and while I understand the concept the exact math and implementation details are pretty fuzzy to me, especially when varients of gradient descent are brought up. I'll keep looking into this more though, and any good resources would be appreciated.
Second, from some preliminary testing, while I can see the tuning is starting to encode basic chess principles into the piece-square tables, the tables still pretty awful, and don't even complete with handcrafted tables. But I'm assuming running through the full dataset every iteration, and running several hundred iterations will significantly improve things? In addition to going through several training sessions and randomizing the epd file as well? Lastly, if it matters, I'm putting the baseline material values into piece-square tables themselves and letting the tuner run from there (e.g. the pawn middle-game table starts off with 100 for each square, each square in the middle-game table for knights starts off with 300, etc.). That way, I don't need to count material separately.
- Parse all the positions once, and load them into memory
- Add bounds to values if it makes sense, for example you should probably never have passed pawn award that is negative!
- Add multi-threading (if you don't have it)
-
jdart
- Posts: 4410
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: Texel tuning questions
>did you need to run through many iterations of the full training set before you started to get good piece-square tables?
My tuner is multithreaded and typically takes about 150 iterations to converge. That's maybe an hour of runtime on a 24-core Xeon.
My tuner is multithreaded and typically takes about 150 iterations to converge. That's maybe an hour of runtime on a 24-core Xeon.
-
amanjpro
- Posts: 883
- Joined: Sat Mar 13, 2021 1:47 am
- Full name: Amanj Sherwany
Re: Texel tuning questions
Hey Christian,
You can definitely take a look at my code, and if you wanted feel free to copy-adapt-paste (and if you want to understand what is happening, then it is a different story, https://github.com/amanjpro/zahak/blob/ ... _tuning.go)
Idea for mutli-threading is simple, lets say you have 10 cores, and 1M positions:
- A single threaded loop tunes one single weight (+1 or -1)
- You lunch a job to test the evaluation on the positions (Eval and MSE)
- Now, since Eval produced a single integer, and integer addition can be done in any order x+y+z == z+y+x, you could run the eval function in any order and still get the same result... Now, we know that, we also know that if we tell each core to compute the evals of 100K positions, we then sum their results and have our MSE... And you continue doing that
- Here is my multi-threaded MSE: https://github.com/amanjpro/zahak/blob/ ... #L401-L414
As for the bounds, my weights have both upper and lower bounds and I track them naively, you don't need to squeeze every performance bit here, this is not the bottleneck, here is how I do it: https://github.com/amanjpro/zahak/blob/ ... ng.go#L327
I was lucky that my weights are ordered in a way that everything before index 768 can go below zero, everything after it cannot... you can do something similar
If your tuner takes 2 hours run it for 2 hours... you should give it time to produced optimal weights, I have no idea what will be the behaviour if you cut the tuner short
You can definitely take a look at my code, and if you wanted feel free to copy-adapt-paste (and if you want to understand what is happening, then it is a different story, https://github.com/amanjpro/zahak/blob/ ... _tuning.go)
Idea for mutli-threading is simple, lets say you have 10 cores, and 1M positions:
- A single threaded loop tunes one single weight (+1 or -1)
- You lunch a job to test the evaluation on the positions (Eval and MSE)
- Now, since Eval produced a single integer, and integer addition can be done in any order x+y+z == z+y+x, you could run the eval function in any order and still get the same result... Now, we know that, we also know that if we tell each core to compute the evals of 100K positions, we then sum their results and have our MSE... And you continue doing that
- Here is my multi-threaded MSE: https://github.com/amanjpro/zahak/blob/ ... #L401-L414
As for the bounds, my weights have both upper and lower bounds and I track them naively, you don't need to squeeze every performance bit here, this is not the bottleneck, here is how I do it: https://github.com/amanjpro/zahak/blob/ ... ng.go#L327
I was lucky that my weights are ordered in a way that everything before index 768 can go below zero, everything after it cannot... you can do something similar
If your tuner takes 2 hours run it for 2 hours... you should give it time to produced optimal weights, I have no idea what will be the behaviour if you cut the tuner short
-
algerbrex
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: Texel tuning questions
Hmm ok, thanks. The reason I asked is that from my estimate, for my tuner to make even one iteration through the full training set is going to take 2-4 hours. And so for any reasonable convergence to occur, I'm looking at days to weeks of runtime, which seems a little extreme, especially given the numbers you posted. So I'm going to try out some of the recommendations I've been given here and see if that helps speed things up.
-
algerbrex
- Posts: 608
- Joined: Sun May 30, 2021 5:03 am
- Location: United States
- Full name: Christian Dean
Re: Texel tuning questions
Thanks Amanj, all of the links you posted were very helpful, and my code actually looks pretty much identically to yours, so that's encouragingamanjpro wrote: ↑Tue Aug 17, 2021 6:23 pm Hey Christian,
You can definitely take a look at my code, and if you wanted feel free to copy-adapt-paste (and if you want to understand what is happening, then it is a different story, https://github.com/amanjpro/zahak/blob/ ... _tuning.go)
Idea for mutli-threading is simple, lets say you have 10 cores, and 1M positions:
- A single threaded loop tunes one single weight (+1 or -1)
- You lunch a job to test the evaluation on the positions (Eval and MSE)
- Now, since Eval produced a single integer, and integer addition can be done in any order x+y+z == z+y+x, you could run the eval function in any order and still get the same result... Now, we know that, we also know that if we tell each core to compute the evals of 100K positions, we then sum their results and have our MSE... And you continue doing that
- Here is my multi-threaded MSE: https://github.com/amanjpro/zahak/blob/ ... #L401-L414
As for the bounds, my weights have both upper and lower bounds and I track them naively, you don't need to squeeze every performance bit here, this is not the bottleneck, here is how I do it: https://github.com/amanjpro/zahak/blob/ ... ng.go#L327
I was lucky that my weights are ordered in a way that everything before index 768 can go below zero, everything after it cannot... you can do something similar
If your tuner takes 2 hours run it for 2 hours... you should give it time to produced optimal weights, I have no idea what will be the behavior if you cut the tuner short
I'll add those bounds the way you're doing in your code, so hopefully, that'll help.
And thanks for the explanation on multi-threading, I think I understand what you mean, so I'll try to implement that in my own code.