test positions for texel tuning
Moderators: hgm, Rebel, chrisw
-
- Posts: 565
- Joined: Thu Nov 13, 2014 12:03 pm
Re: test positions for texel tuning
it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
-
- Posts: 793
- Joined: Sun Aug 03, 2014 4:48 am
- Location: London, UK
Re: test positions for texel tuning
Why would you think that?whereagles wrote:it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
I am not saying I disagree with you. I haven't seen evidence or strong arguments either way.
I do know that I have had a lot of success with gradient descent, though, training a neural network evaluator from random initialization to state-of-art level. I have not seen any evidence suggesting that gradient descent leads to lower quality minimums than other optimization methods.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
-
- Posts: 565
- Joined: Thu Nov 13, 2014 12:03 pm
Re: test positions for texel tuning
I said that just out of feeling. Although chess is highly non-linear, reports of small parameter changes in eval lead to small ELO gains/losses. I would find it very surprising if a particular combination of parameter settings ends up a in a "deep dip".matthewlai wrote:Why would you think that?whereagles wrote:it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
I am not saying I disagree with you. I haven't seen evidence or strong arguments either way.
I do know that I have had a lot of success with gradient descent, though, training a neural network evaluator from random initialization to state-of-art level. I have not seen any evidence suggesting that gradient descent leads to lower quality minimums than other optimization methods.
I see it as more likely that the landscape is one of a clear (anti) hill with small ups and downs near its top (local optima). Such a landscape is actually ideal for applying the classical particle swarm algorithm (the particles accumulate near the local optima, exploring them all/most in detail). Gradient descent will find the hill very quickly, but might have some difficulties getting to the right local optimum. The other state-of-the art method, genetical algorithms, may work of too, though my feeling is that it will converge slower than the swarm.
Well, just rambling a bit, based on my experience in optimization methods.