Reinforcement learning to tune handcrafted evals
Moderator: Ras
-
- Posts: 512
- Joined: Tue Sep 29, 2020 4:29 pm
- Location: Dublin, Ireland
- Full name: Madeleine Birchfield
Reinforcement learning to tune handcrafted evals
Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Reinforcement learning to tune handcrafted evals
Sure I did that already. Normally you tune the weights of a neuronal netowork - but you can also tune the parameters of your algorithm via genetic optimisation.Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Its really just a Darwin approach with multiple populations (to not get stuck in a local minimum).
So you generate 5 populations of N engines with a randomized seed.
Then you find out which 10% performed best and let the rest die. (die = copy the best 10% over the 90% and only slightly change some values again)
Then you repeat these steps and also enable cross intersection - where you take the best engine of each population and cross some of its "dna" = float values (this makes training faster) and copy this over an engine marked for deletion.
This mirrors the real world optimisation of life. Reproduction, Mutation, Selection and will optimize towards a score. Each population will get stuck in a (good) local minumum but crossbreeding among populations will find even better solutions.
gradient descent ist bad as this will 100% get stuck in one optimum value but wont find a better one.
I implemented this a while ago while in first year of school:
https://www.codeproject.com/Articles/79 ... he-Unknown
The image should make clear what i mean with "stuck in one optimum" the image is in 3 dimensions but you find search an optimum with 100+ dimensions. So there are a lot more spots that look good - but are really weak compared to what could be done.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Reinforcement learning to tune handcrafted evals
Now heres one thing that blew my mind. The "handcrafted evaluation function" where you have a 64 slot array for pawns or knights and each square has its own handpicked value: Its mathematically identical to a neuronal network with 1 layer and no activation function.Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
So the first layer of a neuronal network is like the classical evaluation. - (but implemented better in terms of optimized GEMM)
Each consecutive layer can replace more code and find things like forks etc. and the activation function creates the non linear behaviour that is needed to solve more and more steps ahead in each layer.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Reinforcement learning to tune handcrafted evals
But you better ask on reddit this forum here is relatively dead.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 879
- Joined: Mon Dec 15, 2008 11:45 am
Re: Reinforcement learning to tune handcrafted evals
You did it already. With what? Your Perft Move Generator?dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pmSure I did that already. ...Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?

-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Reinforcement learning to tune handcrafted evals
Nah I have been programming in different fields my whole life. https://github.com/Gigantua/RaytraceDesperado wrote: ↑Thu Dec 02, 2021 8:24 pmYou did it already. With what? Your Perft Move Generator?dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pmSure I did that already. ...Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?![]()
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Reinforcement learning to tune handcrafted evals
You're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.dangi12012 wrote: ↑Thu Dec 02, 2021 9:30 pmNah I have been programming in different fields my whole life. https://github.com/Gigantua/RaytraceDesperado wrote: ↑Thu Dec 02, 2021 8:24 pmYou did it already. With what? Your Perft Move Generator?dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pmSure I did that already. ...Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?![]()
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.
-
- Posts: 1062
- Joined: Tue Apr 28, 2020 10:03 pm
- Full name: Daniel Infuehr
Re: Reinforcement learning to tune handcrafted evals
I get asked a question. I answer the question. You take offence.Sopel wrote: ↑Fri Dec 03, 2021 1:06 amYou're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.dangi12012 wrote: ↑Thu Dec 02, 2021 9:30 pmNah I have been programming in different fields my whole life. https://github.com/Gigantua/RaytraceDesperado wrote: ↑Thu Dec 02, 2021 8:24 pmYou did it already. With what? Your Perft Move Generator?dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pmSure I did that already. ...Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?![]()
Great Forum - but needs stronger moderation.
Also sopel you have embarrassed yourself enough on this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=78798
So better not continue that here. Please stop trolling. Thanks.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Daniel Inführ - Software Developer
-
- Posts: 263
- Joined: Wed Jun 16, 2021 2:08 am
- Location: Berlin
- Full name: Jost Triller
Re: Reinforcement learning to tune handcrafted evals
Reinforcement learning is just another way to optimize some policy, namely a process where the optimization doesn't need external data. In our case, this policy would probably be the evaluation function (or maybe some function that scores moves for move sorting). You could argue that an engine which uses simple Texel tuning, but starts with random parameters, then plays some games, labels these games, and then uses these games to optimize the parameters, and then does all this again, does reinforcement learning.Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?
Generally, reinforcement learning for HCE is the same as for neural networks, as long as you can get a gradient for your hand crafted evaluation function. From the outside, HCE and NN look the same: a derivable function that gets as input all the parameters and the position, and has as output a single value describing the winning chances.
-
- Posts: 391
- Joined: Tue Oct 08, 2019 11:39 pm
- Full name: Tomasz Sobczyk
Re: Reinforcement learning to tune handcrafted evals
Dude your EGO is off the scale.dangi12012 wrote: ↑Fri Dec 03, 2021 1:40 amI get asked a question. I answer the question. You take offence.Sopel wrote: ↑Fri Dec 03, 2021 1:06 amYou're still incapable of understanding that your first link is irrelevant to chess? You're great at writing a lot of bullshit that no one cares about.dangi12012 wrote: ↑Thu Dec 02, 2021 9:30 pmNah I have been programming in different fields my whole life. https://github.com/Gigantua/RaytraceDesperado wrote: ↑Thu Dec 02, 2021 8:24 pmYou did it already. With what? Your Perft Move Generator?dangi12012 wrote: ↑Wed Dec 01, 2021 4:48 pmSure I did that already. ...Madeleine Birchfield wrote: ↑Wed Dec 01, 2021 4:18 pm Is it possible to use reinforcement learning algorithms such as Q learning or temporal difference learning to tune handcrafted evaluation functions? And if so, how does it compare to the existing tuning methods, such as Texel tuning or gradient descent?![]()
Great Forum - but needs stronger moderation.
Also sopel you have embarrassed yourself enough on this thread: http://www.talkchess.com/forum3/viewtop ... =7&t=78798
So better not continue that here. Please stop trolling. Thanks.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.
Maybe you copied your stockfish commits from someone else too?
I will look into that.