I'm not the first to complain about the naming:syzygy wrote:It is MCTS without MC...
http://talkchess.com/forum/viewtopic.ph ... 672#453672
Moderators: hgm, Rebel, chrisw
I'm not the first to complain about the naming:syzygy wrote:It is MCTS without MC...
True. But for the lack of a better term I still say mcts ( most of the time). It should be quite clear for anyone that had a look at the alphaGoZero/AlphaZero papers that this is basically mcts without mc.syzygy wrote:I'm not the first to complain about the naming:syzygy wrote:It is MCTS without MC...
http://talkchess.com/forum/viewtopic.ph ... 672#453672
An obvious way would be to measure correlation of cells with known simple domain-specific knowledge.CheckersGuy wrote:It would obviously be nice to deduce what the neural network has actually learned but how would you go about that ?
Exactly. Because they can learn them. Not because they are very efficient for evaluating them.The reason why deep NN's are used is that they can learn complex/abstract functions.
Not at all. There is absolutely nothing linear in detecting features like pins, and X-rays. They are either there, or they aren't, which is a step function.What you describe isn't very far from a linear evaluation function which is not very complex at all.
Than we would explicitly code. Anything can be coded, once you kow what you need. But this is no problem at all, because you don't withold any information from the NN that it gets now. (Which I understand is just the board position.) You just help it o the way by making available on its inputs more information that could be enormously useful to it, and what it then doesn't have to learn to recognize in a comparatively very cumbersome way.The NN from AlphaZero probably learned more abstract concepts than we could explicitly code.
This is just a wild guess, and I strongly doubt that there is any truth in it. For one, Stockfish usually has a very good idea whether it is dead lost. A quiescece search will do the trick. In positions where Stockfish would need 20 ply before it can see it is lost, e.g. whether an attack on its King fortress will be decisive or not, the NN will almost certainly be not able to recogize that either. Only when the loss is due to obvious strategic features that Stockfish is blind to, like trapping his own Bishop on g1/h2 behind blocked Pawns on f2/g3, the N (and human players) would immediately spot it. Things that can go either way depending on deep tactical calculation will be outside the scope of a NN. Its win-probabilty prediction might not even be better than Stockfish' QS.For example, stockfish may need a 20 ply search to see that he is dead lost and A0 only needs one call to the neural network because it has learned that those positions are lost.
Well, so then the NN will have to learn to recognize these features from the board position itself. Which it has to do now anyway.To get back to chess. You may think that you have covered all x-ray patterns or whatever evaluation term you have explicitly coded. Just to find out some time later that you did a horrible job because there were many cases you did not cover.
And that is exactly the point. The trained NN is large and cumbersome, contaiing large parts that are practically uused, but would have been useful for Go, Shogi, Monopoly, Tennis.. You would need a whole lot less expensive hardware if you could cull all that out. And condense the parts that do trivial things in a cumbersome way to some dedicted goal. To show that a completely general NN, not using any domain-specific knowledge, can be trained to do the job is wonderful and shocking. But it doesn't mean that it is a smart way to achieve the goal.It's not like everyone has to train their engine If DeepMind decided to publish the AlphaZero-engine, they would supply the weights for the NeuralNetwork and no one would have to train it anymore. This obviously assumes that we have the hardware to run it on
This is a misinterpretation of the meaning of "linear" in this context. Linear means that the evaluation function is linear in the bonus/penalty for such a feature. This is the case in almost all evaluation functions used by chess engines.There is absolutely nothing linear in detecting features like pins, and X-rays. They are either there, or they aren't, which is a step function.
Compression NNs is an active topic of research: https://arxiv.org/abs/1710.09282hgm wrote:The trained NN is large and cumbersome, contaiing large parts that are practically uused
So what exactly is the misinterpretation? I did in no way specify how the information about that such features are present or not should be used, and in particular not that it should be used to define an additive bonus/penalty. I proposed to feed it to a NN. Which is as different from a conventional, linear Chess-engine evaluation as day and night. The NN would of course use recognized pin patterns to eliminate the pinned pieces from its SEE calculations, etc. Something no Chess engine seems to do.Michel wrote:This is a misinterpretation of the meaning of "linear" in this context. Linear means that the evaluation function is linear in the bonus/penalty for such a feature. This is the case in almost all evaluation functions used by chess engines.
I don't even see why this should be a question at all, as the aswer so obviously seems to be a big fat "no". It also seems totally irrelevant. The whole idea of a linear combination makes it an immediate bust. There is nothing linear about Chess tactics. It is all Boolean logic.Rein Halbersma wrote:The main question is whether it is realistic to assume that it is possible to hand-code (or even automate) a series of chess-knowledge intensive patterns and have a linear combination of those patterns cover 99% of the information in the NN.
Some time ago I started working on a new evaluation function for my chess engine, I decided to use something similar for pawn evaluation with 3 x 6 patterns. It is still in it's infancy but the first results look promising.Rein Halbersma wrote: However, in draughts Fabien Letouzey made a very nice contribution by scanning the board (hence his program's name Scan) in 4 x 4 regions (and some larger patterns) and computed the indices of each pattern into a large lookup-table of weights. These weights were trained using logistic regression. So apart from the logistic mapping at the end, a linear combination of patterns and weights. The question is whether something like that works in chess.
OK, so do you think that a Boolean logic function (e.g. a decision tree) can approximate a neural network chess evaluation function without loss in accuracy?hgm wrote:I don't even see why this should be a question at all, as the aswer so obviously seems to be a big fat "no". It also seems totally irrelevant. The whole idea of a linear combination makes it an immediate bust. There is nothing linear about Chess tactics. It is all Boolean logic.Rein Halbersma wrote:The main question is whether it is realistic to assume that it is possible to hand-code (or even automate) a series of chess-knowledge intensive patterns and have a linear combination of those patterns cover 99% of the information in the NN.