test positions for texel tuning

Discussion of chess software programming and technical issues.

Moderators: Harvey Williamson, bob, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Evert
Posts: 2929
Joined: Fri Jan 21, 2011 11:42 pm
Location: NL
Contact:

Re: test positions for texel tuning

Post by Evert » Fri Jan 15, 2016 9:09 pm

matthewlai wrote: Gradient descent is indeed much more expensive. But it is also much more accurate.
Is it really more expensive though?
Each iteration will be more expensive, but you will require far fewer iterations to converge (if things converge at all...). It's not so clear to me that it will be slower in the end (but I haven't tried this yet for chess).

Henk
Posts: 6558
Joined: Mon May 27, 2013 8:31 am

Re: test positions for texel tuning

Post by Henk » Fri Jan 15, 2016 10:42 pm

My experience is that simulated annealing is often best. All these other algorithms only find local maxima. So actually they sample and their sample points are local maxima. And there can be quite a lot.

ymatioun
Posts: 64
Joined: Fri Oct 18, 2013 9:40 pm
Location: New York

Re: test positions for texel tuning

Post by ymatioun » Fri Jan 15, 2016 10:59 pm

Gradient descent can be fast if you capture evaluation weights during call to evaluate(). Those are weights that are multiplied by coefficients and summed up to get the evaluation. (this only works for linear evaluation functions)

Then after one pass through Qsearch() you can construct full correlation matrix, and invert it using a form of gradient descent such as the conjugate gradient method. This step takes no time at all since you are only operating on a 1000 by 1000 matrix.

This is what i do for Fizbo, and this way i can generate optimized evaluation coefficients in a couple of minutes.

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: test positions for texel tuning

Post by matthewlai » Sat Jan 16, 2016 7:09 am

Evert wrote:
matthewlai wrote: Gradient descent is indeed much more expensive. But it is also much more accurate.
Is it really more expensive though?
Each iteration will be more expensive, but you will require far fewer iterations to converge (if things converge at all...). It's not so clear to me that it will be slower in the end (but I haven't tried this yet for chess).
It depends on the error landscape. For example, in the extreme case, if all features are completely independent, tuning parameter-by-parameter would be much faster.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: test positions for texel tuning

Post by matthewlai » Sat Jan 16, 2016 7:11 am

ymatioun wrote:Gradient descent can be fast if you capture evaluation weights during call to evaluate(). Those are weights that are multiplied by coefficients and summed up to get the evaluation. (this only works for linear evaluation functions)

Then after one pass through Qsearch() you can construct full correlation matrix, and invert it using a form of gradient descent such as the conjugate gradient method. This step takes no time at all since you are only operating on a 1000 by 1000 matrix.

This is what i do for Fizbo, and this way i can generate optimized evaluation coefficients in a couple of minutes.
Yeah if the function is differentiable (from linear evaluation functions, to something as complex as neural networks), gradient descent is almost certainly a better bet.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

whereagles
Posts: 562
Joined: Thu Nov 13, 2014 11:03 am

Re: test positions for texel tuning

Post by whereagles » Sat Jan 16, 2016 10:32 am

There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: test positions for texel tuning

Post by matthewlai » Sat Jan 16, 2016 10:37 am

whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
There is nothing, short of exploration of the entire parameter space, that can guarantee global minimum.

Gradient descent, in practice, has been shown to result in very good minimums most of the time.

It has been a source of major worry in machine learning for a long time, but nowadays most people don't really worry about it anymore, because it's not really a big problem in practice.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

Henk
Posts: 6558
Joined: Mon May 27, 2013 8:31 am

Re: test positions for texel tuning

Post by Henk » Sat Jan 16, 2016 11:14 am

I there are parameters with discrete values and have a short domain then you often get useless local maxima because one of the parameters got near the bound of its domain. Also when I added a penalty if a parameter gets out of its domain it did not help much.

Henk
Posts: 6558
Joined: Mon May 27, 2013 8:31 am

Re: test positions for texel tuning

Post by Henk » Sat Jan 16, 2016 11:30 am

whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
Gradient, Genetic and tabu search find local optima. That's what I remember. Perhaps they modified genetic search to make it find global optimum I don't know.

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: test positions for texel tuning

Post by matthewlai » Sat Jan 16, 2016 12:00 pm

Henk wrote:
whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
Gradient, Genetic and tabu search find local optima. That's what I remember. Perhaps they modified genetic search to make it find global optimum I don't know.
It is not theoretically possible.

Imagine a convex space in 2D (only 1 local minimum which is the global minimum), then add an very deep minimum somewhere in a well that is infinitely narrow. This is now the global minimum. There is no possible way to find it except if you just happened to sample that point. This point can exist anywhere. Therefore, an algorithm that guarantees finding the global minimum MUST sample all points in the space.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.

Post Reply