Google's AlphaGo team has been working on chess

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 9:51 pm

Milos wrote:
CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches
You really don't get how UCT without rollouts operates, do you?
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.

I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves

Milos · Post by **Milos** » Wed Dec 13, 2017 9:57 pm

AlvaroBegue wrote:800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.

There are no playouts at all. First you need to get the terminology and it is not just a matter of semantics.
Even though he never mentioned "expanded nodes" (you introduced that totally randomly), expanded nodes is a wrong term, since a single leaf node can be expended in multiple new leaves so accurate term is evaluations, i.e. 800 nodes are evaluated.
There are 800 independent searches (or simulations how they call it in AGZ paper) per move.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.

At which depth do you think mate is reached by search?
The rest of the paragraph is totally irrelevant since no-one actually mentioned comparison with any "random moves" algorithm.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 10:19 pm

Milos wrote:
AlvaroBegue wrote:800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.
There are no playouts at all. First you need to get the terminology and it is not just a matter of semantics.
Even though he never mentioned "expanded nodes" (you introduced that totally randomly), expanded nodes is a wrong term, since a single leaf node can be expended in multiple new leaves so accurate term is evaluations, i.e. 800 nodes are evaluated.
There are 800 independent searches (or simulations how they call it in AGZ paper) per move.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.
At which depth do you think mate is reached by search?
The rest of the paragraph is totally irrelevant since no-one actually mentioned comparison with any "random moves" algorithm.

Dude, take your pills. I am not interested in participating in a conversation with such poor tone.

Milos · Post by **Milos** » Wed Dec 13, 2017 10:23 pm

CheckersGuy wrote:I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves

So many wrong things.
In the beginning of the training NN weights are random so output probabilities are random and uniform which gives exactly the shallowest possible tree.
Even for highly selective NN (later in training), in each node there are at least few best moves that have similar probability 0.5-0.7 (and A0 uses UCT policy that selects other candidates much more often than original UCB1 - which you wrote up even though you omitted sqrt of sum of total visit count). 0.99 never happens unless NN actually sees a mate in few moves, so your example is totally irrelevant.

Milos · Post by **Milos** » Wed Dec 13, 2017 10:24 pm

AlvaroBegue wrote:Dude, take your pills. I am not interested in participating in a conversation with such poor tone.

Then don't. You are free to choose any excuse you prefer.
Regarding mentioning pills, it speaks volumes about your tone.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 10:28 pm

Milos wrote:
CheckersGuy wrote:I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves
So many wrong things.
In the beginning of the training NN weights are random so output probabilities are random and uniform which gives exactly the shallowest possible tree.
Even for highly selective NN (later in training), in each node there are at least few best moves that have similar probability 0.5-0.7 (and A0 uses UCT policy that selects other candidates much more often than original UCB1 - which you wrote up even though you omitted sqrt of sum of total visit count). 0.99 never happens unless NN actually sees a mate in few moves, so your example is totally irrelevant.

I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already

Milos · Post by **Milos** » Wed Dec 13, 2017 10:58 pm

CheckersGuy wrote:I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already

First few iterations of what?
Previously you were talking about training games (coz only there you have 800MCTS per move in games against SF you have 80k).
If network is already fully trained NN is selective, but not nearly as much as you are suggesting.
When I said few moves 0.5-0.7, I meant best move has 0.5-0.7 others (other lets say 1-2 moves) would have 0.3-0.5 combined.
So realistic example for selective NN would be something like P = [0.6 0.25 0.1 0.02 0.01 0.01 0.01].
So already after 3 times selecting first move, fourth time second move would be selected, and 8th time third one, etc.
So lets say in each node is the same P vector and num of simulations is large (80k) and tree is already 19 deep at the longest path.
At first simulation you extend it to 20. Next time you'd extend it after roughly 1/0.6^21 = 46k simulations so it would be extended once more.
Other branches are proportionally shorter so for next move if your opponent played the best expected response you'd extend the depth for 1 to 22, otherwise you'd had a shorter longest branch that you'd extend for 1.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 11:06 pm

Milos wrote:
CheckersGuy wrote:I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already
First few iterations of what?
Previously you were talking about training games (coz only there you have 800MCTS per move in games against SF you have 80k).
If network is already fully trained NN is selective, but not nearly as much as you are suggesting.
When I said few moves 0.5-0.7, I meant best move has 0.5-0.7 others (other lets say 1-2 moves) would have 0.3-0.5 combined.
So realistic example for selective NN would be something like P = [0.6 0.25 0.1 0.02 0.01 0.01 0.01].
So already after 3 times selecting first move, fourth time second move would be selected, and 8th time third one, etc.

First few iterations of the algorithm obviously.

in each node there are at least few best moves that have similar probability 0.5-0.7

This is what you said. If you call the probabilites you gave above "similiar" to 0.5-0.7 than thats just bogus

As for your example. Sure you will visits other moves but those moves are more likely to be bad and therefore have a bad action value and won't be visited that often in subsequent iterations. Don't forget that the search isnt entirely based on the move probabilites but the value that gets backed up from leafNodes goes into the calculation as well

Milos · Post by **Milos** » Wed Dec 13, 2017 11:15 pm

CheckersGuy wrote:First few iterations of the algorithm obviously.

Which algorithm? Training algorithm iteration (since iteration is what is used in training in the paper) or MCTS iteration?

As for your example. Sure you will visits other moves but those moves are more likely to be bad and therefore have a bad action value and won't be visited that often in subsequent iterations. Don't forget that the search isnt entirely based on the move probabilites but the value that gets backed up from leafNodes goes into the calculation as well

See the edit in my previous post. Point is extending depth is not so easy as you originally suggested and even in the best branch tree was not very deep even in the games against SF.

trulses · Post by **trulses** » Wed Dec 13, 2017 11:38 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

You seem a bit confused, when people say there are 800 nodes per search they refer to the number of simulations per move, this should be fairly obvious if you read the paper.

The point still stands, with the MCTS hyper-parameters from A0 and a randomly initialized network you will typically find mate in one from the root position if your prior probabilities are roughly uniformly random.

To find a mate in one from the root node, you would have to be in a mate-in-one position. It's fairly obvious to me that most of those positions would not be in the early game, so your comment that these positions would be in the end game seems to be an indication that you're missing the point or you lack some understanding.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess