Reinforcement learning to tune handcrafted evals

Madeleine Birchfield · Wed Dec 01, 2021 4:18 pm

Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?

dangi12012 · Post by **dangi12012** » Wed Dec 01, 2021 4:48 pm

Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?

Sure I did that already. Normally you tune the weights of a neuronal netowork - but you can also tune the parameters of your algorithm via genetic optimisation.

Its really just a Darwin approach with multiple populations (to not get stuck in a local minimum).

So you generate 5 populations of N engines with a randomized seed.
Then you find out which 10% performed best and let the rest die. (die = copy the best 10% over the 90% and only slightly change some values again)
Then you repeat these steps and also enable cross intersection - where you take the best engine of each population and cross some of its "dna" = float values (this makes training faster) and copy this over an engine marked for deletion.

This mirrors the real world optimisation of life. Reproduction, Mutation, Selection and will optimize towards a score. Each population will get stuck in a (good) local minumum but crossbreeding among populations will find even better solutions.

gradient descent ist bad as this will 100% get stuck in one optimum value but wont find a better one.

I implemented this a while ago while in first year of school:
https://www.codeproject.com/Articles/79 ... he-Unknown

The image should make clear what i mean with "stuck in one optimum" the image is in 3 dimensions but you find search an optimum with 100+ dimensions. So there are a lot more spots that look good - but are really weak compared to what could be done.

dangi12012 · Post by **dangi12012** » Wed Dec 01, 2021 4:52 pm

Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?

Now heres one thing that blew my mind. The "handcrafted evaluation function" where you have a 64 slot array for pawns or knights and each square has its own handpicked value: Its mathematically identical to a neuronal network with 1 layer and no activation function.

So the first layer of a neuronal network is like the classical evaluation. - (but implemented better in terms of optimized GEMM)
Each consecutive layer can replace more code and find things like forks etc. and the activation function creates the non linear behaviour that is needed to solve more and more steps ahead in each layer.

dangi12012 · Post by **dangi12012** » Thu Dec 02, 2021 6:08 pm

But you better ask on reddit this forum here is relatively dead.

Desperado · Post by **Desperado** » Thu Dec 02, 2021 8:24 pm

dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...

You did it already. With what? Your Perft Move Generator?

dangi12012 · Post by **dangi12012** » Thu Dec 02, 2021 9:30 pm

Desperado wrote: ↑Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator?

Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace

Sopel · Post by **Sopel** » Fri Dec 03, 2021 1:06 am

dangi12012 wrote: ↑Thu Dec 02, 2021 9:30 pm
Desperado wrote: ↑Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator?
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace

You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.

dangi12012 · Post by **dangi12012** » Fri Dec 03, 2021 1:40 am

Sopel wrote: ↑Fri Dec 03, 2021 1:06 am
dangi12012 wrote: ↑Thu Dec 02, 2021 9:30 pm
Desperado wrote: ↑Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator?
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.

I get asked a question. I answer the question. You take offence.
Great Forum - but needs stronger moderation.
Also sopel you have embarrassed yourself enough on this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=78798

So better not continue that here. Please stop trolling. Thanks.

j.t. · Post by **j.t.** » Fri Dec 03, 2021 1:58 am

Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?

Reinforcement learning is just another way to optimize some policy, namely a process where the optimization doesn't need external data. In our case, this policy would probably be the evaluation function (or maybe some function that scores moves for move sorting). You could argue that an engine which uses simple Texel tuning, but starts with random parameters, then plays some games, labels these games, and then uses these games to optimize the parameters, and then does all this again, does reinforcement learning.

Generally, reinforcement learning for HCE is the same as for neural networks, as long as you can get a gradient for your hand crafted evaluation function. From the outside, HCE and NN look the same: a derivable function that gets as input all the parameters and the position, and has as output a single value describing the winning chances.

Sopel · Post by **Sopel** » Fri Dec 03, 2021 2:05 am

dangi12012 wrote: ↑Fri Dec 03, 2021 1:40 am
Sopel wrote: ↑Fri Dec 03, 2021 1:06 am
dangi12012 wrote: ↑Thu Dec 02, 2021 9:30 pm
Desperado wrote: ↑Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator?
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.
I get asked a question. I answer the question. You take offence.
Great Forum - but needs stronger moderation.
Also sopel you have embarrassed yourself enough on this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=78798

So better not continue that here. Please stop trolling. Thanks.

Dude your EGO is off the scale.

Reinforcement learning to tune handcrafted evals

Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals

Re: Reinforcement learning to tune handcrafted evals