Reinforcement learning to tune handcrafted evals

Discussion of chess software programming and technical issues.

Moderator: Ras

mar
Posts: 2655
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Reinforcement learning to tune handcrafted evals

Post by mar »

dangi12012 wrote: Thu Dec 02, 2021 9:30 pm Nah I have been programming in different fields my whole life. https://github.com/Gigantua/Raytrace
lol, that is so underwhelming.... so you are completely clueless even outside the domain of chess, interesting.
you overestimate yourself by several orders of magnitude, I'd say a book example of Dunning-Kruger.

(pushing binaries like pdb and obj to a github repo also speaks volumes)

btw calling a dumb raytracer a "game engine", good one!

so - you keep spamming the programming forum with nonsense, feeling the urge to comment on every thread - that is what I call trolling

you called a forum member a racist for no reason at all (apparently that's a serious allegation), you keep pointing fingers at others calling them trolls
in reality the one who's trolling here - your behavior apparently annoys a lot of people.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

mar wrote: Fri Dec 03, 2021 9:01 am
Salty mind. I feel sad for you.
Also you didnt provide any Information to OP. Just trying to derail the topic.

Moderators should enforce forum eticuette. Its so easy to critizise - while its hard to build something yourself!

Back to topic: I gave advice no one refuted it - so it is sound.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
j.t.
Posts: 263
Joined: Wed Jun 16, 2021 2:08 am
Location: Berlin
Full name: Jost Triller

Re: Reinforcement learning to tune handcrafted evals

Post by j.t. »

dangi12012 wrote: Fri Dec 03, 2021 1:44 pm Back to topic: I gave advice no one refuted it - so it is sound.
Regarding the genetic optimization: I don't think this is directly something to do with reinforcement learning. Sure, you can combine genetic learning with reinforcement learning, but I believe for many HCE the gradient descent approach may be more performant. There is a reason why in practice most neural networks are tuned using backpropagation, and not genetic algorithms.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Reinforcement learning to tune handcrafted evals

Post by dangi12012 »

j.t. wrote: Fri Dec 03, 2021 2:28 pm
dangi12012 wrote: Fri Dec 03, 2021 1:44 pm Back to topic: I gave advice no one refuted it - so it is sound.
Regarding the genetic optimization: I don't think this is directly something to do with reinforcement learning. Sure, you can combine genetic learning with reinforcement learning, but I believe for many HCE the gradient descent approach may be more performant. There is a reason why in practice most neural networks are tuned using backpropagation, and not genetic algorithms.
Yes they all gradually descent into one of the very many optimums. Its like a mountian where you dont reach the top because you get stuck on the very first hill.
You need a lot of random starting values and gradient descent them - and then you can pick the best out of the pool.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Madeleine Birchfield
Posts: 512
Joined: Tue Sep 29, 2020 4:29 pm
Location: Dublin, Ireland
Full name: Madeleine Birchfield

Re: Reinforcement learning to tune handcrafted evals

Post by Madeleine Birchfield »

dangi12012 wrote: Wed Dec 01, 2021 4:48 pm Sure I did that already. Normally you tune the weights of a neuronal netowork - but you can also tune the parameters of your algorithm via genetic optimisation.

Its really just a Darwin approach with multiple populations (to not get stuck in a local minimum).

So you generate 5 populations of N engines with a randomized seed.
Then you find out which 10% performed best and let the rest die. (die = copy the best 10% over the 90% and only slightly change some values again)
Then you repeat these steps and also enable cross intersection - where you take the best engine of each population and cross some of its "dna" = float values (this makes training faster) and copy this over an engine marked for deletion.

This mirrors the real world optimisation of life. Reproduction, Mutation, Selection and will optimize towards a score. Each population will get stuck in a (good) local minumum but crossbreeding among populations will find even better solutions.
I'm already aware what genetic algorithms and population-based training methods are, but they are not really related to the question at hand.