test positions for texel tuning

Evert · Post by **Evert** » Fri Jan 15, 2016 10:09 pm

matthewlai wrote: Gradient descent is indeed much more expensive. But it is also much more accurate.

Is it really more expensive though?
Each iteration will be more expensive, but you will require far fewer iterations to converge (if things converge at all...). It's not so clear to me that it will be slower in the end (but I haven't tried this yet for chess).

Henk · Post by **Henk** » Fri Jan 15, 2016 11:42 pm

My experience is that simulated annealing is often best. All these other algorithms only find local maxima. So actually they sample and their sample points are local maxima. And there can be quite a lot.

ymatioun · Post by **ymatioun** » Fri Jan 15, 2016 11:59 pm

Gradient descent can be fast if you capture evaluation weights during call to evaluate(). Those are weights that are multiplied by coefficients and summed up to get the evaluation. (this only works for linear evaluation functions)

Then after one pass through Qsearch() you can construct full correlation matrix, and invert it using a form of gradient descent such as the conjugate gradient method. This step takes no time at all since you are only operating on a 1000 by 1000 matrix.

This is what i do for Fizbo, and this way i can generate optimized evaluation coefficients in a couple of minutes.

matthewlai · Post by **matthewlai** » Sat Jan 16, 2016 8:09 am

Evert wrote:
matthewlai wrote: Gradient descent is indeed much more expensive. But it is also much more accurate.
Is it really more expensive though?
Each iteration will be more expensive, but you will require far fewer iterations to converge (if things converge at all...). It's not so clear to me that it will be slower in the end (but I haven't tried this yet for chess).

It depends on the error landscape. For example, in the extreme case, if all features are completely independent, tuning parameter-by-parameter would be much faster.

matthewlai · Post by **matthewlai** » Sat Jan 16, 2016 8:11 am

ymatioun wrote:Gradient descent can be fast if you capture evaluation weights during call to evaluate(). Those are weights that are multiplied by coefficients and summed up to get the evaluation. (this only works for linear evaluation functions)

Then after one pass through Qsearch() you can construct full correlation matrix, and invert it using a form of gradient descent such as the conjugate gradient method. This step takes no time at all since you are only operating on a 1000 by 1000 matrix.

This is what i do for Fizbo, and this way i can generate optimized evaluation coefficients in a couple of minutes.

Yeah if the function is differentiable (from linear evaluation functions, to something as complex as neural networks), gradient descent is almost certainly a better bet.

whereagles · Post by **whereagles** » Sat Jan 16, 2016 11:32 am

There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.

matthewlai · Post by **matthewlai** » Sat Jan 16, 2016 11:37 am

whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.

There is nothing, short of exploration of the entire parameter space, that can guarantee global minimum.

Gradient descent, in practice, has been shown to result in very good minimums most of the time.

It has been a source of major worry in machine learning for a long time, but nowadays most people don't really worry about it anymore, because it's not really a big problem in practice.

Henk · Post by **Henk** » Sat Jan 16, 2016 12:14 pm

I there are parameters with discrete values and have a short domain then you often get useless local maxima because one of the parameters got near the bound of its domain. Also when I added a penalty if a parameter gets out of its domain it did not help much.

Henk · Post by **Henk** » Sat Jan 16, 2016 12:30 pm

whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.

Gradient, Genetic and tabu search find local optima. That's what I remember. Perhaps they modified genetic search to make it find global optimum I don't know.

matthewlai · Post by **matthewlai** » Sat Jan 16, 2016 1:00 pm

Henk wrote:
whereagles wrote:There are many optimization methods... gradient, genetic, anealing, swarm, ant colony, tabu, etc. Greedy ones like gradient often converge the fastest but into a local optimum, not necessarily a global one.
Gradient, Genetic and tabu search find local optima. That's what I remember. Perhaps they modified genetic search to make it find global optimum I don't know.

It is not theoretically possible.

Imagine a convex space in 2D (only 1 local minimum which is the global minimum), then add an very deep minimum somewhere in a well that is infinitely narrow. This is now the global minimum. There is no possible way to find it except if you just happened to sample that point. This point can exist anywhere. Therefore, an algorithm that guarantees finding the global minimum MUST sample all points in the space.

test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning

Re: test positions for texel tuning