Poll: How Many "Weights" Needed To Play "Known" Chess Very Well?
Posted: Sun Oct 20, 2019 10:37 pm
Thanks to AlphaZero, we know that a good game of chess can be played by a NN with 50 million weights. As explained in other threads, I am interested in building is an evaluation function as a system of linear expressions rather than an NN (an NN is actually analogous to a system of linear expressions, and, as linked from another thread, it is possible to extract a set of expressions from an NN). As far as I know, Deepmind (the authors of AlphaZero) are not interested in making their NNs smaller having trained them, but, a system of linear expressions often can be made more compact while giving equally good results (see other thread for two ways of doing this). I believe that AlphaZero could almost certainly play equally well at a much smaller size if Deepmind were interested in doing this, so for me, 50 million weights is very much an upper limit for AlphaZero's standard of play. My opinion is that optimising linear expressions to reduce the number of expression would be MUCH easier than reducing the size of an NN and getting the same result.
The human brain has 10^11 neurons, and the average neuron has connections to 10,000 other neurons, for a total of 10^11x10^4=10^15 weights.
Using linear expressions instead of NNs, call V "values and W "weights", the expressions would look roughly like this:
Output 1 = V1*W1
Output 2 = V2*W2 + V3*W3 + V4*W4
The expression for output 1 has one weight, and the expression for output 2 has 3 weights, making a total of 4.
In order to model the complexity of chess, you'd need a large number of expressions in each layer, and multiple layers of NNs. As discussed in other threads, I think you'd similarly need multiple layers of linear expressions, each layer below the surface layer being divided differences (link) of the layer above to keep it linear (but, of course, these divided differences would have new weights of their own).
The target of all this optimisation would be to model a data set of of chess positions with known evaluations, and from this model create an evaluation function that would meet the following criteria:
1. differentiate a drawing position from a lost position, so that in a drawing position the program would avoid a losing move, and hopefully choose the drawing move which is the best position
2. in a winning position, give higher scores to move positions which are on the shortest path to checkmate
Chess is a very complicated game, but it's my opinion that the number of weights needed to play known positions nearly perfectly is probably less than most people would think.
Of course, people would find positions that the program would evaluate wrongly, and these positions could be added to the data set, and the system be "retrained". I would predict that, sooner than most people would expect, it would become very difficult to find positions which it evaluated wrongly - and hence you'd have a system that could play "known" chess almost perfectly almost instantaneously. The deep hidden secrets of chess would then be unlocked.
The human brain has 10^11 neurons, and the average neuron has connections to 10,000 other neurons, for a total of 10^11x10^4=10^15 weights.
Using linear expressions instead of NNs, call V "values and W "weights", the expressions would look roughly like this:
Output 1 = V1*W1
Output 2 = V2*W2 + V3*W3 + V4*W4
The expression for output 1 has one weight, and the expression for output 2 has 3 weights, making a total of 4.
In order to model the complexity of chess, you'd need a large number of expressions in each layer, and multiple layers of NNs. As discussed in other threads, I think you'd similarly need multiple layers of linear expressions, each layer below the surface layer being divided differences (link) of the layer above to keep it linear (but, of course, these divided differences would have new weights of their own).
The target of all this optimisation would be to model a data set of of chess positions with known evaluations, and from this model create an evaluation function that would meet the following criteria:
1. differentiate a drawing position from a lost position, so that in a drawing position the program would avoid a losing move, and hopefully choose the drawing move which is the best position
2. in a winning position, give higher scores to move positions which are on the shortest path to checkmate
Chess is a very complicated game, but it's my opinion that the number of weights needed to play known positions nearly perfectly is probably less than most people would think.
Of course, people would find positions that the program would evaluate wrongly, and these positions could be added to the data set, and the system be "retrained". I would predict that, sooner than most people would expect, it would become very difficult to find positions which it evaluated wrongly - and hence you'd have a system that could play "known" chess almost perfectly almost instantaneously. The deep hidden secrets of chess would then be unlocked.