test positions for texel tuning

whereagles · Post by **whereagles** » Sat Jan 16, 2016 10:14 pm

it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.

matthewlai · Post by **matthewlai** » Sat Jan 16, 2016 10:21 pm

whereagles wrote:it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.

Why would you think that?

I am not saying I disagree with you. I haven't seen evidence or strong arguments either way.

I do know that I have had a lot of success with gradient descent, though, training a neural network evaluator from random initialization to state-of-art level. I have not seen any evidence suggesting that gradient descent leads to lower quality minimums than other optimization methods.

whereagles · Post by **whereagles** » Sun Jan 17, 2016 12:33 pm

matthewlai wrote:
whereagles wrote:it all depends on the structure of the problem.. lucky for chess that "deep dip" sort of solution space landscape is very unlikely.
Why would you think that?

I am not saying I disagree with you. I haven't seen evidence or strong arguments either way.

I do know that I have had a lot of success with gradient descent, though, training a neural network evaluator from random initialization to state-of-art level. I have not seen any evidence suggesting that gradient descent leads to lower quality minimums than other optimization methods.

I said that just out of feeling. Although chess is highly non-linear, reports of small parameter changes in eval lead to small ELO gains/losses. I would find it very surprising if a particular combination of parameter settings ends up a in a "deep dip".

I see it as more likely that the landscape is one of a clear (anti) hill with small ups and downs near its top (local optima). Such a landscape is actually ideal for applying the classical particle swarm algorithm (the particles accumulate near the local optima, exploring them all/most in detail). Gradient descent will find the hill very quickly, but might have some difficulties getting to the right local optimum. The other state-of-the art method, genetical algorithms, may work of too, though my feeling is that it will converge slower than the swarm.

Well, just rambling a bit, based on my experience in optimization methods.

test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning