test positions for texel tuning

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: test positions for texel tuning

Post by Evert »

matthewlai wrote: Gradient descent is indeed much more expensive. But it is also much more accurate.
Is it really more expensive though?
Each iteration will be more expensive, but you will require far fewer iterations to converge (if things converge at all...). It's not so clear to me that it will be slower in the end (but I haven't tried this yet for chess).
Henk
Posts: 7220
Joined: Mon May 27, 2013 10:31 am

Re: test positions for texel tuning

Post by Henk »

My experience is that simulated annealing is often best. All these other algorithms only find local maxima. So actually they sample and their sample points are local maxima. And there can be quite a lot.
ymatioun
Posts: 64
Joined: Fri Oct 18, 2013 11:40 pm
Location: New York

Re: test positions for texel tuning

Post by ymatioun »

Gradient descent can be fast if you capture evaluation weights during call to evaluate(). Those are weights that are multiplied by coefficients and summed up to get the evaluation. (this only works for linear evaluation functions)

Then after one pass through Qsearch() you can construct full correlation matrix, and invert it using a form of gradient descent such as the conjugate gradient method. This step takes no time at all since you are only operating on a 1000 by 1000 matrix.

This is what i do for Fizbo, and this way i can generate optimized evaluation coefficients in a couple of minutes.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: test positions for texel tuning

Post by matthewlai »

Evert wrote:
matthewlai wrote: Gradient descent is indeed much more expensive. But it is also much more accurate.
Is it really more expensive though?
Each iteration will be more expensive, but you will require far fewer iterations to converge (if things converge at all...). It's not so clear to me that it will be slower in the end (but I haven't tried this yet for chess).
It depends on the error landscape. For example, in the extreme case, if all features are completely independent, tuning parameter-by-parameter would be much faster.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: test positions for texel tuning

Post by matthewlai »

ymatioun wrote:Gradient descent can be fast if you capture evaluation weights during call to evaluate(). Those are weights that are multiplied by coefficients and summed up to get the evaluation. (this only works for linear evaluation functions)

Then after one pass through Qsearch() you can construct full correlation matrix, and invert it using a form of gradient descent such as the conjugate gradient method. This step takes no time at all since you are only operating on a 1000 by 1000 matrix.

This is what i do for Fizbo, and this way i can generate optimized evaluation coefficients in a couple of minutes.
Yeah if the function is differentiable (from linear evaluation functions, to something as complex as neural networks), gradient descent is almost certainly a better bet.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
whereagles
Posts: 565
Joined: Thu Nov 13, 2014 12:03 pm

Re: test positions for texel tuning

Post by whereagles »

There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: test positions for texel tuning

Post by matthewlai »

whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
There is nothing, short of exploration of the entire parameter space, that can guarantee global minimum.

Gradient descent, in practice, has been shown to result in very good minimums most of the time.

It has been a source of major worry in machine learning for a long time, but nowadays most people don't really worry about it anymore, because it's not really a big problem in practice.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Henk
Posts: 7220
Joined: Mon May 27, 2013 10:31 am

Re: test positions for texel tuning

Post by Henk »

I there are parameters with discrete values and have a short domain then you often get useless local maxima because one of the parameters got near the bound of its domain. Also when I added a penalty if a parameter gets out of its domain it did not help much.
Henk
Posts: 7220
Joined: Mon May 27, 2013 10:31 am

Re: test positions for texel tuning

Post by Henk »

whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
Gradient, Genetic and tabu search find local optima. That's what I remember. Perhaps they modified genetic search to make it find global optimum I don't know.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: test positions for texel tuning

Post by matthewlai »

Henk wrote:
whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
Gradient, Genetic and tabu search find local optima. That's what I remember. Perhaps they modified genetic search to make it find global optimum I don't know.
It is not theoretically possible.

Imagine a convex space in 2D (only 1 local minimum which is the global minimum), then add an very deep minimum somewhere in a well that is infinitely narrow. This is now the global minimum. There is no possible way to find it except if you just happened to sample that point. This point can exist anywhere. Therefore, an algorithm that guarantees finding the global minimum MUST sample all points in the space.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.