Google's AlphaGo team has been working on chess

trulses · Post by **trulses** » Wed Dec 13, 2017 7:31 pm

You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.

Milos · Post by **Milos** » Wed Dec 13, 2017 8:41 pm

trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.

It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 8:48 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 9:19 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.

No, he is precisely correct.

There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.

syzygy · Post by **syzygy** » Wed Dec 13, 2017 9:36 pm

CheckersGuy wrote:
syzygy wrote:QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.
I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

I think it is you who is mixing things up, since your second sentence bears absolutely no relation to what I wrote and you quoted.

AlphaZero actually does carry over the subtree of the move it plays to the next move, as you can read in the paper, but again, that has nothing to do with what I wrote.

Milos · Post by **Milos** » Wed Dec 13, 2017 9:43 pm

CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches

You really don't get how UCT without rollouts operates, do you?
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 9:51 pm

Milos wrote:
CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches
You really don't get how UCT without rollouts operates, do you?
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.

I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves

Milos · Post by **Milos** » Wed Dec 13, 2017 9:57 pm

AlvaroBegue wrote:800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.

There are no playouts at all. First you need to get the terminology and it is not just a matter of semantics.
Even though he never mentioned "expanded nodes" (you introduced that totally randomly), expanded nodes is a wrong term, since a single leaf node can be expended in multiple new leaves so accurate term is evaluations, i.e. 800 nodes are evaluated.
There are 800 independent searches (or simulations how they call it in AGZ paper) per move.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.

At which depth do you think mate is reached by search?
The rest of the paragraph is totally irrelevant since no-one actually mentioned comparison with any "random moves" algorithm.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 10:19 pm

Milos wrote:
AlvaroBegue wrote:800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.
There are no playouts at all. First you need to get the terminology and it is not just a matter of semantics.
Even though he never mentioned "expanded nodes" (you introduced that totally randomly), expanded nodes is a wrong term, since a single leaf node can be expended in multiple new leaves so accurate term is evaluations, i.e. 800 nodes are evaluated.
There are 800 independent searches (or simulations how they call it in AGZ paper) per move.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.
At which depth do you think mate is reached by search?
The rest of the paragraph is totally irrelevant since no-one actually mentioned comparison with any "random moves" algorithm.

Dude, take your pills. I am not interested in participating in a conversation with such poor tone.

Milos · Post by **Milos** » Wed Dec 13, 2017 10:23 pm

CheckersGuy wrote:I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves

So many wrong things.
In the beginning of the training NN weights are random so output probabilities are random and uniform which gives exactly the shallowest possible tree.
Even for highly selective NN (later in training), in each node there are at least few best moves that have similar probability 0.5-0.7 (and A0 uses UCT policy that selects other candidates much more often than original UCB1 - which you wrote up even though you omitted sqrt of sum of total visit count). 0.99 never happens unless NN actually sees a mate in few moves, so your example is totally irrelevant.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess