Google's AlphaGo team has been working on chess

syzygy · Post by **syzygy** » Tue Dec 12, 2017 12:55 am

syzygy wrote:It is MCTS without MC...

I'm not the first to complain about the naming:
http://talkchess.com/forum/viewtopic.ph ... 672#453672

CheckersGuy · Post by **CheckersGuy** » Tue Dec 12, 2017 12:58 am

syzygy wrote:
syzygy wrote:It is MCTS without MC...
I'm not the first to complain about the naming:
http://talkchess.com/forum/viewtopic.ph ... 672#453672

True. But for the lack of a better term I still say mcts ( most of the time). It should be quite clear for anyone that had a look at the alphaGoZero/AlphaZero papers that this is basically mcts without mc.
Maybe we should call it alphaZero-Tree-Search

Luis Babboni · Post by **Luis Babboni** » Tue Dec 12, 2017 2:53 am

Sorry if someone talk about it before, I tried to read all Alpha Zero´s comments but I´m not sure if I could.

I understand that from 100 games, Alpha zero won 25/50 with whites and 3/50 with blacks. None losses with any colour.

Could be possible that, learning playing only itself, A0 concluded that going for tie is the logical goal having blacks?
I mean, that did not won more with blacks cause did not tried it.

hgm · Post by **hgm** » Tue Dec 12, 2017 9:58 am

CheckersGuy wrote:It would obviously be nice to deduce what the neural network has actually learned but how would you go about that ?

An obvious way would be to measure correlation of cells with known simple domain-specific knowledge.

The reason why deep NN's are used is that they can learn complex/abstract functions.

Exactly. Because they can learn them. Not because they are very efficient for evaluating them.

What you describe isn't very far from a linear evaluation function which is not very complex at all.

Not at all. There is absolutely nothing linear in detecting features like pins, and X-rays. They are either there, or they aren't, which is a step function.

The NN from AlphaZero probably learned more abstract concepts than we could explicitly code.

Than we would explicitly code. Anything can be coded, once you kow what you need. But this is no problem at all, because you don't withold any information from the NN that it gets now. (Which I understand is just the board position.) You just help it o the way by making available on its inputs more information that could be enormously useful to it, and what it then doesn't have to learn to recognize in a comparatively very cumbersome way.

For example, stockfish may need a 20 ply search to see that he is dead lost and A0 only needs one call to the neural network because it has learned that those positions are lost.

This is just a wild guess, and I strongly doubt that there is any truth in it. For one, Stockfish usually has a very good idea whether it is dead lost. A quiescece search will do the trick. In positions where Stockfish would need 20 ply before it can see it is lost, e.g. whether an attack on its King fortress will be decisive or not, the NN will almost certainly be not able to recogize that either. Only when the loss is due to obvious strategic features that Stockfish is blind to, like trapping his own Bishop on g1/h2 behind blocked Pawns on f2/g3, the N (and human players) would immediately spot it. Things that can go either way depending on deep tactical calculation will be outside the scope of a NN. Its win-probabilty prediction might not even be better than Stockfish' QS.

But that is no problem, because it knows very well what it doesn't know, and can only be resolved by search. And then it orders the search, and the node will get expanded, and the precise tactics will get into view.
It would be very hard to explicitly encode this concept into the evaluation function (without doing a search).

The difference between AlphaZero and Stockfish is not that one sees things at zero ply that the other takes 20 ply to see. The difference is that to search the 20 ply needed to see it, Stockfish needs a million nodes, and Alpha Zero only a couple of hundred.

To get back to chess. You may think that you have covered all x-ray patterns or whatever evaluation term you have explicitly coded. Just to find out some time later that you did a horrible job because there were many cases you did not cover.

Well, so then the NN will have to learn to recognize these features from the board position itself. Which it has to do now anyway.

It's not like everyone has to train their engine If DeepMind decided to publish the AlphaZero-engine, they would supply the weights for the NeuralNetwork and no one would have to train it anymore. This obviously assumes that we have the hardware to run it on

And that is exactly the point. The trained NN is large and cumbersome, contaiing large parts that are practically uused, but would have been useful for Go, Shogi, Monopoly, Tennis.. You would need a whole lot less expensive hardware if you could cull all that out. And condense the parts that do trivial things in a cumbersome way to some dedicted goal. To show that a completely general NN, not using any domain-specific knowledge, can be trained to do the job is wonderful and shocking. But it doesn't mean that it is a smart way to achieve the goal.

Michel · Post by **Michel** » Tue Dec 12, 2017 10:16 am

There is absolutely nothing linear in detecting features like pins, and X-rays. They are either there, or they aren't, which is a step function.

This is a misinterpretation of the meaning of "linear" in this context. Linear means that the evaluation function is linear in the bonus/penalty for such a feature. This is the case in almost all evaluation functions used by chess engines.

In contrast NNs use a cascade of linear functions and nonlinear functions (the latter are logistic functions or some approximation of those).

Note that I do not want to claim that NNs are necessarily better (I don't know). But they are definitely more flexible than the evaluation functions typically used by chess engines.

Rein Halbersma · Post by **Rein Halbersma** » Tue Dec 12, 2017 10:35 am

hgm wrote:The trained NN is large and cumbersome, contaiing large parts that are practically uused

Compression NNs is an active topic of research: https://arxiv.org/abs/1710.09282

The main question is whether it is realistic to assume that it is possible to hand-code (or even automate) a series of chess-knowledge intensive patterns and have a linear combination of those patterns cover 99% of the information in the NN. The same observation about large unused knowledge is true for endgame databases, yet it has proven to be very difficult to find a small low-dimensional Boolean expression in terms of the piece locations that efficiently encodes all that knowledge (even with say 99% accuracy it is very hard to find a lossy compression that is fast to decode).

However, in draughts Fabien Letouzey made a very nice contribution by scanning the board (hence his program's name Scan) in 4 x 4 regions (and some larger patterns) and computed the indices of each pattern into a large lookup-table of weights. These weights were trained using logistic regression. So apart from the logistic mapping at the end, a linear combination of patterns and weights. The question is whether something like that works in chess.

hgm · Post by **hgm** » Tue Dec 12, 2017 11:54 am

Michel wrote:This is a misinterpretation of the meaning of "linear" in this context. Linear means that the evaluation function is linear in the bonus/penalty for such a feature. This is the case in almost all evaluation functions used by chess engines.

So what exactly is the misinterpretation? I did in no way specify how the information about that such features are present or not should be used, and in particular not that it should be used to define an additive bonus/penalty. I proposed to feed it to a NN. Which is as different from a conventional, linear Chess-engine evaluation as day and night. The NN would of course use recognized pin patterns to eliminate the pinned pieces from its SEE calculations, etc. Something no Chess engine seems to do.

hgm · Post by **hgm** » Tue Dec 12, 2017 11:59 am

Rein Halbersma wrote:The main question is whether it is realistic to assume that it is possible to hand-code (or even automate) a series of chess-knowledge intensive patterns and have a linear combination of those patterns cover 99% of the information in the NN.

I don't even see why this should be a question at all, as the aswer so obviously seems to be a big fat "no". It also seems totally irrelevant. The whole idea of a linear combination makes it an immediate bust. There is nothing linear about Chess tactics. It is all Boolean logic.

Joost Buijs · Post by **Joost Buijs** » Tue Dec 12, 2017 2:31 pm

Rein Halbersma wrote: However, in draughts Fabien Letouzey made a very nice contribution by scanning the board (hence his program's name Scan) in 4 x 4 regions (and some larger patterns) and computed the indices of each pattern into a large lookup-table of weights. These weights were trained using logistic regression. So apart from the logistic mapping at the end, a linear combination of patterns and weights. The question is whether something like that works in chess.

Some time ago I started working on a new evaluation function for my chess engine, I decided to use something similar for pawn evaluation with 3 x 6 patterns. It is still in it's infancy but the first results look promising.

Another advantage is that with BMI2 the pawn evaluator runs a lot faster than before. As it seems now I can remove the pawn evaluation cache which will make the program cleaner as well.

I don't know when I will have conclusive results, it will take a few months, right now I'm busy restructuring the whole engine because it became a terrible mess after all the things I tried a couple of years ago.

Rein Halbersma · Post by **Rein Halbersma** » Tue Dec 12, 2017 2:41 pm

hgm wrote:
Rein Halbersma wrote:The main question is whether it is realistic to assume that it is possible to hand-code (or even automate) a series of chess-knowledge intensive patterns and have a linear combination of those patterns cover 99% of the information in the NN.
I don't even see why this should be a question at all, as the aswer so obviously seems to be a big fat "no". It also seems totally irrelevant. The whole idea of a linear combination makes it an immediate bust. There is nothing linear about Chess tactics. It is all Boolean logic.

OK, so do you think that a Boolean logic function (e.g. a decision tree) can approximate a neural network chess evaluation function without loss in accuracy?

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess