You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.
This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
Google's AlphaGo team has been working on chess
Moderators: hgm, Rebel, chrisw
-
- Posts: 39
- Joined: Wed Dec 06, 2017 5:34 pm
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Google's AlphaGo team has been working on chess
It seems you got it wrong.trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.
This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.
-
- Posts: 273
- Joined: Wed Aug 24, 2016 9:49 pm
Re: Google's AlphaGo team has been working on chess
Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searchesMilos wrote:It seems you got it wrong.trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.
This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Google's AlphaGo team has been working on chess
No, he is precisely correct.Milos wrote:It seems you got it wrong.trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.
This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.
In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.
-
- Posts: 5557
- Joined: Tue Feb 28, 2012 11:56 pm
Re: Google's AlphaGo team has been working on chess
I think it is you who is mixing things up, since your second sentence bears absolutely no relation to what I wrote and you quoted.CheckersGuy wrote:I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).syzygy wrote:QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.
But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.
AlphaZero actually does carry over the subtree of the move it plays to the next move, as you can read in the paper, but again, that has nothing to do with what I wrote.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Google's AlphaGo team has been working on chess
You really don't get how UCT without rollouts operates, do you?CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.
-
- Posts: 273
- Joined: Wed Aug 24, 2016 9:49 pm
Re: Google's AlphaGo team has been working on chess
I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understandMilos wrote:You really don't get how UCT without rollouts operates, do you?CheckersGuy wrote:Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches
Try simulating it on the paper, it would help.
It has nothing to do with NN output, even with distribution of probabilities, because algorithm tries to get as even exploration of the tree as possible.
What I can immediately tell you that in training matches (800 MCTS per move) depth is hardly over 10, and actual games against SF expected (80000 MCTS per move) depth that was reached is up to 20 at most.
The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....
In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.
Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves
Last edited by CheckersGuy on Wed Dec 13, 2017 10:00 pm, edited 2 times in total.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Google's AlphaGo team has been working on chess
There are no playouts at all. First you need to get the terminology and it is not just a matter of semantics.AlvaroBegue wrote:800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.
Even though he never mentioned "expanded nodes" (you introduced that totally randomly), expanded nodes is a wrong term, since a single leaf node can be expended in multiple new leaves so accurate term is evaluations, i.e. 800 nodes are evaluated.
There are 800 independent searches (or simulations how they call it in AGZ paper) per move.
At which depth do you think mate is reached by search?In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.
The rest of the paragraph is totally irrelevant since no-one actually mentioned comparison with any "random moves" algorithm.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: Google's AlphaGo team has been working on chess
Dude, take your pills. I am not interested in participating in a conversation with such poor tone.Milos wrote:There are no playouts at all. First you need to get the terminology and it is not just a matter of semantics.AlvaroBegue wrote:800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.
Even though he never mentioned "expanded nodes" (you introduced that totally randomly), expanded nodes is a wrong term, since a single leaf node can be expended in multiple new leaves so accurate term is evaluations, i.e. 800 nodes are evaluated.
There are 800 independent searches (or simulations how they call it in AGZ paper) per move.
At which depth do you think mate is reached by search?In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.
The rest of the paragraph is totally irrelevant since no-one actually mentioned comparison with any "random moves" algorithm.
-
- Posts: 4190
- Joined: Wed Nov 25, 2009 1:47 am
Re: Google's AlphaGo team has been working on chess
So many wrong things.CheckersGuy wrote:I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand
The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....
In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.
Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves
In the beginning of the training NN weights are random so output probabilities are random and uniform which gives exactly the shallowest possible tree.
Even for highly selective NN (later in training), in each node there are at least few best moves that have similar probability 0.5-0.7 (and A0 uses UCT policy that selects other candidates much more often than original UCB1 - which you wrote up even though you omitted sqrt of sum of total visit count). 0.99 never happens unless NN actually sees a mate in few moves, so your example is totally irrelevant.