Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?

hgm · Post by **hgm** » Wed Oct 23, 2019 5:35 pm

I am not sure what you mean by 'Linear Programming'. As far as I know LP is a method to optimize a linear function with respect to linear constraints. (Without constraints linear functions of course have no maximum.) Which seems as far from a chess evaluation function as anything I can imagine.

matthewlai · Post by **matthewlai** » Wed Oct 23, 2019 6:39 pm

A few things -

1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing - they cannot model any non-linearity, and most real world decision functions (definitely including chess) require modeling non-linearity. In a fully-connected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a non-linear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.

2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely non-linear.

3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.

towforce · Post by **towforce** » Wed Oct 23, 2019 6:46 pm

hgm wrote: ↑Wed Oct 23, 2019 5:35 pm I am not sure what you mean by 'Linear Programming'. As far as I know LP is a method to optimize a linear function with respect to linear constraints. (Without constraints linear functions of course have no maximum.) Which seems as far from a chess evaluation function as anything I can imagine.

You'd be surprised at the range of challenges to which LP can be applied. In the Thinkers Forum, I have been able to solve a range of puzzles using LP, including a couple that nobody else was able to solve any other way.

Some LP languages (e.g. MathProg - my personal favourite) allow you to define variables, ranges and sets. You can also build the model by using a software library (e.g. Google OR Tools, which I have used).

Suppose we want an evaluation function (ev). The value it will produce (you could get multiple values - but let's keep it simple) will be the sum of a set of expressions.

We have a set of position/evaluation pairs. We want the ev to produce the same (or nearly the same) results as the evaluation given for each provided position.

We generate a set of expressions that relate to a chess position. Each value (V) in the position will be multiplied by a weight (W). The value comes from the chess position, the weight is calculated by the LP solver.

VW combinations from the raw position ("surface level VWs") are not likely to be enough to model chess. I would therefore start building expressions linking VWs. Each individual expression must be linear, so multiplying a variable by a variable is not permissible - but you can get polynomial shapes by using divided differences.

More layers of differences can be added as required.

The "objective function" (how you tell the model what to optimise) would be to minimise the differences between the provided position/evaluation pairs and the result the expressions would produce in each case.

As I think about it, the problem you may run into is that you might find yourself hitting the upper limit of the size of LP model that can be solved before you have an ev that can play near perfect chess. The team that solved draughts (checkers) said the main thing that made their job possible was advancing technology over the decades that they worked on the problem. The same thing could happen here: the upper limit of size of LP model that can be solved is growing rapidly at the current time.

towforce · Post by **towforce** » Wed Oct 23, 2019 6:56 pm

matthewlai wrote: ↑Wed Oct 23, 2019 6:39 pm A few things -

1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing - they cannot model any non-linearity, and most real world decision functions (definitely including chess) require modeling non-linearity. In a fully-connected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a non-linear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.

2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely non-linear.

3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.

Thank you for an excellent and knowledgeable contribution!

Although all expression in an LP model must be linear, you can create the equivalent of polynomial expressions using divided differences. See the "Method Of Differences" section in the Wiki Difference Engine article (link) for a simple example of this - the table shows the generation of a polynomial by adding differences together.

matthewlai · Post by **matthewlai** » Wed Oct 23, 2019 7:21 pm

towforce wrote: ↑Wed Oct 23, 2019 6:56 pm
matthewlai wrote: ↑Wed Oct 23, 2019 6:39 pm A few things -

1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing - they cannot model any non-linearity, and most real world decision functions (definitely including chess) require modeling non-linearity. In a fully-connected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a non-linear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.

2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely non-linear.

3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.

Thank you for an excellent and knowledgeable contribution!

Although all expression in an LP model must be linear, you can create the equivalent of polynomial expressions using divided differences. See the "Method Of Differences" section in the Wiki Difference Engine article (link) for a simple example of this - the table shows the generation of a polynomial by adding differences together.

It's true on a theoretical level - neural networks are universal approximators, and just like any other universal approximators (like polynomials), can in theory be used to model any function to any arbitrary accuracy. The only question is efficiency (both in training time and model size). We have spent a lot of time looking for efficient universal approximators for the types of functions we care about (in this case chess evaluation), neural networks are the most efficient approximators we have so far (especially convolutional neural networks when applied to 2D/3D/4D images), and that's why most of the cutting edge machine learning research is done with neural networks nowadays. There's no theoretical reason why another class of universal approxiamtors can be more efficient, but we haven't found them yet (and we have certainly spent a lot of time looking).

Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?

How many weights needed to play known chess very well using definitions in first post?

Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?

Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?

Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?

Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?

Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?