test positions for texel tuning

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: test positions for texel tuning

Post by whereagles »

it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: test positions for texel tuning

Post by matthewlai »

whereagles wrote:it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
Why would you think that?

I am not saying I disagree with you. I haven't seen evidence or strong arguments either way.

I do know that I have had a lot of success with gradient descent, though, training a neural network evaluator from random initialization to state-of-art level. I have not seen any evidence suggesting that gradient descent leads to lower quality minimums than other optimization methods.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: test positions for texel tuning

Post by whereagles »

matthewlai wrote:
whereagles wrote:it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
Why would you think that?

I am not saying I disagree with you. I haven't seen evidence or strong arguments either way.

I do know that I have had a lot of success with gradient descent, though, training a neural network evaluator from random initialization to state-of-art level. I have not seen any evidence suggesting that gradient descent leads to lower quality minimums than other optimization methods.
I said that just out of feeling. Although chess is highly non-linear, reports of small parameter changes in eval lead to small ELO gains/losses. I would find it very surprising if a particular combination of parameter settings ends up a in a "deep dip".

I see it as more likely that the landscape is one of a clear (anti) hill with small ups and downs near its top (local optima). Such a landscape is actually ideal for applying the classical particle swarm algorithm (the particles accumulate near the local optima, exploring them all/most in detail). Gradient descent will find the hill very quickly, but might have some difficulties getting to the right local optimum. The other state-of-the art method, genetical algorithms, may work of too, though my feeling is that it will converge slower than the swarm.

Well, just rambling a bit, based on my experience in optimization methods.