Reinforcement learning to tune handcrafted evals

Discussion of chess software programming and technical issues.

Moderator: Ras

Madeleine Birchfield
Posts: 512
Joined: Tue Sep 29, 2020 4:29 pm
Location: Dublin, Ireland
Full name: Madeleine Birchfield

Reinforcement learning to tune handcrafted evals

Post by Madeleine Birchfield »

Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. Normally you tune the weights of a neuronal netowork - but you can also tune the parameters of your algorithm via genetic optimisation.

Its really just a Darwin approach with multiple populations (to not get stuck in a local minimum).

So you generate 5 populations of N engines with a randomized seed.
Then you find out which 10% performed best and let the rest die. (die = copy the best 10% over the 90% and only slightly change some values again)
Then you repeat these steps and also enable cross intersection - where you take the best engine of each population and cross some of its "dna" = float values (this makes training faster) and copy this over an engine marked for deletion.

This mirrors the real world optimisation of life. Reproduction, Mutation, Selection and will optimize towards a score. Each population will get stuck in a (good) local minumum but crossbreeding among populations will find even better solutions.

gradient descent ist bad as this will 100% get stuck in one optimum value but wont find a better one.

I implemented this a while ago while in first year of school:
https://www.codeproject.com/Articles/79 ... he-Unknown

The image should make clear what i mean with "stuck in one optimum" the image is in 3 dimensions but you find search an optimum with 100+ dimensions. So there are a lot more spots that look good - but are really weak compared to what could be done.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Now heres one thing that blew my mind. The "handcrafted evaluation function" where you have a 64 slot array for pawns or knights and each square has its own handpicked value: Its mathematically identical to a neuronal network with 1 layer and no activation function.

So the first layer of a neuronal network is like the classical evaluation. - (but implemented better in terms of optimized GEMM)
Each consecutive layer can replace more code and find things like forks etc. and the activation function creates the non linear behaviour that is needed to solve more and more steps ahead in each layer.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

But you better ask on reddit this forum here is relatively dead.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
Desperado
Posts: 879
Joined: Mon Dec 15, 2008 11:45 am

Re: Reinforcement learning to tune handcrafted evals

Post by Desperado »

dangi12012 wrote: Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator? :P
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

Desperado wrote: Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator? :P
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Reinforcement learning to tune handcrafted evals

Post by Sopel »

dangi12012 wrote: Thu Dec 02, 2021 9:30 pm
Desperado wrote: Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator? :P
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

Sopel wrote: Fri Dec 03, 2021 1:06 am
dangi12012 wrote: Thu Dec 02, 2021 9:30 pm
Desperado wrote: Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator? :P
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.
I get asked a question. I answer the question. You take offence.
Great Forum - but needs stronger moderation.
Also sopel you have embarrassed yourself enough on this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=78798

So better not continue that here. Please stop trolling. Thanks.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
j.t.
Posts: 263
Joined: Wed Jun 16, 2021 2:08 am
Location: Berlin
Full name: Jost Triller

Re: Reinforcement learning to tune handcrafted evals

Post by j.t. »

Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Reinforcement learning is just another way to optimize some policy, namely a process where the optimization doesn't need external data. In our case, this policy would probably be the evaluation function (or maybe some function that scores moves for move sorting). You could argue that an engine which uses simple Texel tuning, but starts with random parameters, then plays some games, labels these games, and then uses these games to optimize the parameters, and then does all this again, does reinforcement learning.

Generally, reinforcement learning for HCE is the same as for neural networks, as long as you can get a gradient for your hand crafted evaluation function. From the outside, HCE and NN look the same: a derivable function that gets as input all the parameters and the position, and has as output a single value describing the winning chances.
Sopel
Posts: 391
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: Reinforcement learning to tune handcrafted evals

Post by Sopel »

dangi12012 wrote: Fri Dec 03, 2021 1:40 am
Sopel wrote: Fri Dec 03, 2021 1:06 am
dangi12012 wrote: Thu Dec 02, 2021 9:30 pm
Desperado wrote: Thu Dec 02, 2021 8:24 pm
dangi12012 wrote: Wed Dec 01, 2021 4:48 pm
Madeleine Birchfield wrote: Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Sure I did that already. ...
You did it already. With what? Your Perft Move Generator? :P
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.
I get asked a question. I answer the question. You take offence.
Great Forum - but needs stronger moderation.
Also sopel you have embarrassed yourself enough on this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=78798

So better not continue that here. Please stop trolling. Thanks.
Dude your EGO is off the scale.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.