Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
Moderators: Harvey Williamson, bob, hgm
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
 hgm
 Posts: 24956
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller
 Contact:
Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
I am not sure what you mean by 'Linear Programming'. As far as I know LP is a method to optimize a linear function with respect to linear constraints. (Without constraints linear functions of course have no maximum.) Which seems as far from a chess evaluation function as anything I can imagine.
Get rid of the shit: vote for SHID!

 Posts: 793
 Joined: Sun Aug 03, 2014 2:48 am
 Location: London, UK
 Contact:
Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
A few things 
1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing  they cannot model any nonlinearity, and most real world decision functions (definitely including chess) require modeling nonlinearity. In a fullyconnected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a nonlinear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.
2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely nonlinear.
3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.
1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing  they cannot model any nonlinearity, and most real world decision functions (definitely including chess) require modeling nonlinearity. In a fullyconnected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a nonlinear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.
2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely nonlinear.
3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
hgm wrote: ↑Wed Oct 23, 2019 3:35 pmI am not sure what you mean by 'Linear Programming'. As far as I know LP is a method to optimize a linear function with respect to linear constraints. (Without constraints linear functions of course have no maximum.) Which seems as far from a chess evaluation function as anything I can imagine.
You'd be surprised at the range of challenges to which LP can be applied. In the Thinkers Forum, I have been able to solve a range of puzzles using LP, including a couple that nobody else was able to solve any other way.
Some LP languages (e.g. MathProg  my personal favourite) allow you to define variables, ranges and sets. You can also build the model by using a software library (e.g. Google OR Tools, which I have used).
Suppose we want an evaluation function (ev). The value it will produce (you could get multiple values  but let's keep it simple) will be the sum of a set of expressions.
We have a set of position/evaluation pairs. We want the ev to produce the same (or nearly the same) results as the evaluation given for each provided position.
We generate a set of expressions that relate to a chess position. Each value (V) in the position will be multiplied by a weight (W). The value comes from the chess position, the weight is calculated by the LP solver.
VW combinations from the raw position ("surface level VWs") are not likely to be enough to model chess. I would therefore start building expressions linking VWs. Each individual expression must be linear, so multiplying a variable by a variable is not permissible  but you can get polynomial shapes by using divided differences.
More layers of differences can be added as required.
The "objective function" (how you tell the model what to optimise) would be to minimise the differences between the provided position/evaluation pairs and the result the expressions would produce in each case.
As I think about it, the problem you may run into is that you might find yourself hitting the upper limit of the size of LP model that can be solved before you have an ev that can play near perfect chess. The team that solved draughts (checkers) said the main thing that made their job possible was advancing technology over the decades that they worked on the problem. The same thing could happen here: the upper limit of size of LP model that can be solved is growing rapidly at the current time.
In life, you can be mostly driven by love, or mostly driven by fear. Too many people are unaware of this basic choice, and default to the option that leads to a life that isn't as good.
Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
matthewlai wrote: ↑Wed Oct 23, 2019 4:39 pmA few things 
1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing  they cannot model any nonlinearity, and most real world decision functions (definitely including chess) require modeling nonlinearity. In a fullyconnected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a nonlinear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.
2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely nonlinear.
3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.
Thank you for an excellent and knowledgeable contribution!
Although all expression in an LP model must be linear, you can create the equivalent of polynomial expressions using divided differences. See the "Method Of Differences" section in the Wiki Difference Engine article (link) for a simple example of this  the table shows the generation of a polynomial by adding differences together.
In life, you can be mostly driven by love, or mostly driven by fear. Too many people are unaware of this basic choice, and default to the option that leads to a life that isn't as good.

 Posts: 793
 Joined: Sun Aug 03, 2014 2:48 am
 Location: London, UK
 Contact:
Re: Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
It's true on a theoretical level  neural networks are universal approximators, and just like any other universal approximators (like polynomials), can in theory be used to model any function to any arbitrary accuracy. The only question is efficiency (both in training time and model size). We have spent a lot of time looking for efficient universal approximators for the types of functions we care about (in this case chess evaluation), neural networks are the most efficient approximators we have so far (especially convolutional neural networks when applied to 2D/3D/4D images), and that's why most of the cutting edge machine learning research is done with neural networks nowadays. There's no theoretical reason why another class of universal approxiamtors can be more efficient, but we haven't found them yet (and we have certainly spent a lot of time looking).towforce wrote: ↑Wed Oct 23, 2019 4:56 pmmatthewlai wrote: ↑Wed Oct 23, 2019 4:39 pmA few things 
1. Saying an NN is equivalent to a linear system of equations is not quite true. It's true for the very first "neural networks" from the 1960s and 1970s, but that's actually what stopped them from progressing  they cannot model any nonlinearity, and most real world decision functions (definitely including chess) require modeling nonlinearity. In a fullyconnected network (also known as a multilayer perceptron / MLP), each layer is a system of linear equations, with a nonlinear "activation function" applied on the output, before going into the next layer (next system of equations). See Giraffe for an example of something like this.
2. But as it turned out, if you use raw inputs (like images), MLPs are highly inefficient (so you need a lot more weights and training time to encode the same amount of knowledge). AZ/LC0/etc all use convolutional neural networks. That's an entirely different architecture that involves pretty complicated mathematical operations (convolutions) that are most definitely nonlinear.
3. Many optimization techniques have been tried on neural networks. Current state of the art is still gradient descent. Most classical optimization techniques are not feasible at this scale.
Thank you for an excellent and knowledgeable contribution!
Although all expression in an LP model must be linear, you can create the equivalent of polynomial expressions using divided differences. See the "Method Of Differences" section in the Wiki Difference Engine article (link) for a simple example of this  the table shows the generation of a polynomial by adding differences together.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.