Google's AlphaGo team has been working on chess

CheckersGuy · Post by **CheckersGuy** » Mon Dec 11, 2017 8:25 pm

Michel wrote:
hgm wrote:So what do they mean by this?

At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?

This. Maybe HG forgot that this paper isn't only about chess but any board game. Therefore terminal positions are scored according to the rules of the game. This quote was about the self-played games which obviously need to be scored !!!

Sven · Post by **Sven** » Mon Dec 11, 2017 9:11 pm

CheckersGuy wrote:
Michel wrote:
hgm wrote:So what do they mean by this?

At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?
This. Maybe HG forgot that this paper isn't only about chess but any board game. Therefore terminal positions are scored according to the rules of the game. This quote was about the self-played games which obviously need to be scored !!!

Strictly spoken, the Nature article is only about Go. But apart from that nitpicking you are right. The phrase quoted by HG is clearly related to terminal positions of the self-play games, not to leaf nodes of MCTS search. It is crucial to understand the difference: training was done by playing a huge number of self-play games ("iterations"), and at each position of those games MCTS was used to calculate move probabilites which in turn were used to improve the NN. According to the section "Methods" an MCTS leaf node is reached after L "time-steps" which sounds like some fixed search depth. AlphaGo Zero (and thus also AlphaZero for chess) does not use MonteCarlo "playouts" or "rollouts". So even though they call their method MCTS-based, it is not like a standard MCTS due to the complete lack of random playouts.

CheckersGuy · Post by **CheckersGuy** » Mon Dec 11, 2017 9:46 pm

Sven wrote:
CheckersGuy wrote:
Michel wrote:
hgm wrote:So what do they mean by this?

At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?
This. Maybe HG forgot that this paper isn't only about chess but any board game. Therefore terminal positions are scored according to the rules of the game. This quote was about the self-played games which obviously need to be scored !!!
Strictly spoken, the Nature article is only about Go. But apart from that nitpicking you are right. The phrase quoted by HG is clearly related to terminal positions of the self-play games, not to leaf nodes of MCTS search. It is crucial to understand the difference: training was done by playing a huge number of self-play games ("iterations"), and at each position of those games MCTS was used to calculate move probabilites which in turn were used to improve the NN. According to the section "Methods" an MCTS leaf node is reached after L "time-steps" which sounds like some fixed search depth. AlphaGo Zero (and thus also AlphaZero for chess) does not use MonteCarlo "playouts" or "rollouts". So even though they call their method MCTS-based, it is not like a standard MCTS due to the complete lack of random playouts.

Yes.
I meant the alphaZero paper and not the AlphaGoZero-nature paper. I guess we will find out more about how A0 zero works once they publish the entire paper

syzygy · Post by **syzygy** » Mon Dec 11, 2017 9:59 pm

Sven wrote:According to the section "Methods" an MCTS leaf node is reached after L "time-steps" which sounds like some fixed search depth.

As I understand it, they basically start with a tree consisting of just the root node. This node is at the same time the single leaf node and is expanded by running it through the NN. The move probabilities returned by the NN are assigned to the edges from the root node to the new leaf nodes. In the next step, one of these edges is chosen by applying the UCB1 strategy. We are then in a leaf node, which is expanded, etc. After every expansion, the move probabilities of the new leaf nodes are backed up to the root. So with 800 "simulations" (but you skip the simulations) you end up with a tree of 801 nodes. The most promising root move is then chosen and played.

One of the papers (probably the Nature paper) explains that the subtree for the move played is carried over to the next move, which obviously makes sense.

So there is no fixed search depth, but there is a fixed tree size (at least in self-play games for training).

AlphaGo Zero (and thus also AlphaZero for chess) does not use MonteCarlo "playouts" or "rollouts". So even though they call their method MCTS-based, it is not like a standard MCTS due to the complete lack of random playouts.

It is MCTS without MC...

hgm · Post by **hgm** » Mon Dec 11, 2017 10:00 pm

OK, I hadn't seen the nature paper; just the one on Alpha Zero, and they seem to have this part left out of there. This is why I asked.

I guess it makes sense: in fact the whole collection of training games at one setting of the NN parameters can be considered a samplig of a giant MCTS. The test games themselves are the random playouts of that search.

For playing a game with the trained network they just simulate the MCTS using the sampling probabilities. Which should not have any detrimetal effects; it just spares you from the sampling noise you would otherwise have. Nodes a few ply below the leaves will average their score over so many leaves, that replacing an actual playout by the average outcome of such a playout would be justified, even though the individual outcome predictions in leaves will be pretty unreliable. (Because the NN will have difficulty seeing tactics, and only has the task to indicate what is useful to look at, because it has sufficient potential to be good.)

This whole exercise proves that a search strategy that is highly selective (but takes great care in what it selects) can be competitive with brute-force search. (Which we of course already knew, because it is how humans play Chess.) The current reserach gives an upper bound on the complexity of a NN that can achieve the required accuracy.

Now we know that neural networks are a quite inefficient way to solve problems. Their only virtue is that they can be trained to do the job, which is very helpful when the programmer doesn't know how to do the job himself. Usually you need a large overkill in terms of neurons, to make it reasonably sure that the connections you turn out to need will accidentally be in the network. So it is very likely that the neural network that came out can eventually be simplified a few orders of magnitude by explicitly programmed calculation of intermediate results it seems to use in its calculation, and now derives only at great cost. (Imagine how many neurons you would need to recognize a Knight attacks a Queen somewhere on the board, and suggest making the capture is a good idea, and how many operations it would require to 'run' such a network, compared to how long a program written to do just that would need for it.)

So it is quite conceivable that such a highly selective search could in fact run on a normal PC, provided the NN can be replaced by dedicated code doing the same thing with much less effort, now that the training has revealed what it actually is that needs to be done.

CheckersGuy · Post by **CheckersGuy** » Mon Dec 11, 2017 10:30 pm

syzygy wrote:
Sven wrote:According to the section "Methods" an MCTS leaf node is reached after L "time-steps" which sounds like some fixed search depth.
As I understand it, they basically start with a tree consisting of just the root node. This node is at the same time the single leaf node and is expanded by running it through the NN. The move probabilities returned by the NN are assigned to the edges from the root node to the new leaf nodes. In the next step, one of these edges is chosen by applying the UCB1 strategy. We are then in a leaf node, which is expanded, etc. After every expansion, the move probabilities of the new leaf nodes are backed up to the root. So with 800 "simulations" (but you skip the simulations) you end up with a tree of 801 nodes. The most promising root move is then chosen and played.

One of the papers (probably the Nature paper) explains that the subtree for the move played is carried over to the next move, which obviously makes sense.

So there is no fixed search depth, but there is a fixed tree size (at least in self-play games for training).

AlphaGo Zero (and thus also AlphaZero for chess) does not use MonteCarlo "playouts" or "rollouts". So even though they call their method MCTS-based, it is not like a standard MCTS due to the complete lack of random playouts.
It is MCTS without MC...

It's not the move probabilities that are backed up but the winning probability of the leaf-node according to the neural network.
The NN has two outputs. If you are in a state S the neural network outputs a probability distribution for all the moves a in state S. Additionally, the NN also outputs a winning probability for the state S !!!
The winning probability takes the place of the random playout in normal mcts and is therefore propagated up the root. The move probabilities provide a bias in the selection phase of the mcts.

Rein Halbersma · Post by **Rein Halbersma** » Mon Dec 11, 2017 11:18 pm

hgm wrote: Now we know that neural networks are a quite inefficient way to solve problems. Their only virtue is that they can be trained to do the job, which is very helpful when the programmer doesn't know how to do the job himself. Usually you need a large overkill in terms of neurons, to make it reasonably sure that the connections you turn out to need will accidentally be in the network. So it is very likely that the neural network that came out can eventually be simplified a few orders of magnitude by explicitly programmed calculation of intermediate results it seems to use in its calculation, and now derives only at great cost. (Imagine how many neurons you would need to recognize a Knight attacks a Queen somewhere on the board, and suggest making the capture is a good idea, and how many operations it would require to 'run' such a network, compared to how long a program written to do just that would need for it.)

The AlphaGoZero network was huge, with 46 million parameters and 80 layers. But if you want whole-board interactions on a chess board to be modeled, you need at least a handful of layers (they're using 3x3 convolutions, so you need at least 4 convolutions to get every square being connected to every other square in the top layer). This still requires 2.4 million parameters. And, at least for Go, even 20 vs 40 layers made a big difference in the strength that could be obtained. I'm not saying big savings can't be found, but it's not gonna be easy.

AlvaroBegue · Post by **AlvaroBegue** » Mon Dec 11, 2017 11:25 pm

Rein Halbersma wrote:
hgm wrote: Now we know that neural networks are a quite inefficient way to solve problems. Their only virtue is that they can be trained to do the job, which is very helpful when the programmer doesn't know how to do the job himself. Usually you need a large overkill in terms of neurons, to make it reasonably sure that the connections you turn out to need will accidentally be in the network. So it is very likely that the neural network that came out can eventually be simplified a few orders of magnitude by explicitly programmed calculation of intermediate results it seems to use in its calculation, and now derives only at great cost. (Imagine how many neurons you would need to recognize a Knight attacks a Queen somewhere on the board, and suggest making the capture is a good idea, and how many operations it would require to 'run' such a network, compared to how long a program written to do just that would need for it.)
The AlphaGoZero network was huge, with 46 million parameters and 80 layers. But if you want whole-board interactions on a chess board to be modeled, you need at least a handful of layers (they're using 3x3 convolutions, so you need at least 4 convolutions to get every square being connected to every other square in the top layer). This still requires 2.4 million parameters. And, at least for Go, even 20 vs 40 layers made a big difference in the strength that could be obtained. I'm not saying big savings can't be found, but it's not gonna be easy.

You can use dilation to make the receptive field much larger. WaveNet is a good example of this, and recently some people are using dilation instead of pooling for image recognition too.

By using dilation, you can cover the board in logarithmically-many layers instead of linearly-many layers, as a function of board size. It's probably worth exploring.

hgm · Post by **hgm** » Tue Dec 12, 2017 12:01 am

Rein Halbersma wrote:The AlphaGoZero network was huge, with 46 million parameters and 80 layers. But if you want whole-board interactions on a chess board to be modeled, you need at least a handful of layers (they're using 3x3 convolutions, so you need at least 4 convolutions to get every square being connected to every other square in the top layer). This still requires 2.4 million parameters. And, at least for Go, even 20 vs 40 layers made a big difference in the strength that could be obtained. I'm not saying big savings can't be found, but it's not gonna be easy.

But the point is that I want to eliminate the NN altogether. It was nice to have it for training. But most people don't want an engine to train it, they want an engine to use it. So it would be fine to replace the NN by dedicated, immutable code that calculates numbers sufficiently similar to the outputs of the fully trained NN. Whole-board interactions (e.g. detecting (soft-)pins, X-ray attacks, skewers, discovered threats etc.) are not particularly difficult to calculate with dedicated purposeful code. A NN is pretty dumb, and would basically have to examine every ray on the board for the sought pattern of piece types and empty squares, which can occur in many different positions along the ray. While the pattern might be possible only in one location at the same time, due to the limited number of pieces (say a K and opposing Q with an N in between), which is trivial to find when you know where your pieces are located. A routine desiged to detect pins would be orders of magnitude faster than the simulation of a neural net designed for that purpose on a normal CPU. Let alone a random neural net large enough to be able to lear it through training.

And we can assumes that it is important to detect pins in order to reach the level of play AlphaZero does. And a host of other concepts from classical Chess strategy and tactics. As a good start we could try feeding a NN with efficiently extracted information that we suspect might be useful, such as Pawn structure, passers, Squares attacked near King, mobility. game phase, LxH capture threats, possible checking moves. That should allow a vastly smaller NN to compute the same as what the AlphaZero NN calculates from just the piece locations.

CheckersGuy · Post by **CheckersGuy** » Tue Dec 12, 2017 12:16 am

hgm wrote:
Rein Halbersma wrote:The AlphaGoZero network was huge, with 46 million parameters and 80 layers. But if you want whole-board interactions on a chess board to be modeled, you need at least a handful of layers (they're using 3x3 convolutions, so you need at least 4 convolutions to get every square being connected to every other square in the top layer). This still requires 2.4 million parameters. And, at least for Go, even 20 vs 40 layers made a big difference in the strength that could be obtained. I'm not saying big savings can't be found, but it's not gonna be easy.
But the point is that I want to eliminate the NN altogether. It was nice to have it for training. But most people don't want an engine to train it, they want an engine to use it. So it would be fine to replace the NN by dedicated, immutable code that calculates numbers sufficiently similar to the outputs of the fully trained NN. Whole-board interactions (e.g. detecting (soft-)pins, X-ray attacks, skewers, discovered threats etc.) are not particularly difficult to calculate with dedicated purposeful code. A NN is pretty dumb, and would basically have to examine every ray on the board for the sought pattern of piece types and empty squares, which can occur in many different positions along the ray. While the pattern might be possible only in one location at the same time, due to the limited number of pieces (say a K and opposing Q with an N in between), which is trivial to find when you know where your pieces are located. A routine desiged to detect pins would be orders of magnitude faster than the simulation of a neural net designed for that purpose on a normal CPU. Let alone a random neural net large enough to be able to lear it through training.

It would obviously be nice to deduce what the neural network has actually learned but how would you go about that ?

The reason why deep NN's are used is that they can learn complex/abstract functions. What you describe isn't very far from a linear evaluation function which is not very complex at all. The NN from AlphaZero probably learned more abstract concepts than we could explicitly code. For example, stockfish may need a 20 ply search to see that he is dead lost and A0 only needs one call to the neural network because it has learned that those positions are lost.
It would be very hard to explicitly encode this concept into the evaluation function (without doing a search)

Let me give another example. People tried to detect objects in pictures by explicitly coding a routine that does so. (They relied on edge detection and some other things but that's besides the point). No matter what they tried, there was always some cases where the object detection simple didnt work.

To get back to chess. You may think that you have covered all x-ray patterns or whatever evaluation term you have explicitly coded. Just to find out some time later that you did a horrible job because there were many cases you did not cover.

But the point is that I want to eliminate the NN altogether. It was nice to have it for training. But most people don't want an engine to train it, they want an engine to use it

It's not like everyone has to train their engine

If DeepMind decided to publish the AlphaZero-engine, they would supply the weights for the NeuralNetwork and no one would have to train it anymore. This obviously assumes that we have the hardware to run it on

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess