Google's AlphaGo team has been working on chess

Milos · Post by **Milos** » Wed Dec 13, 2017 10:24 pm

AlvaroBegue wrote:Dude, take your pills. I am not interested in participating in a conversation with such poor tone.

Then don't. You are free to choose any excuse you prefer.
Regarding mentioning pills, it speaks volumes about your tone.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 10:28 pm

Milos wrote:
CheckersGuy wrote:I do get how it works but apparently you dont. The move predictions of the neural networks initially provide a strong bias. This bias gets less as the search traverses the tree more often to get to the even distribution you are talking about. This is similiar to what the uct policy does. Just look at the equations and not at the text and this is not really hard to understand

The even distribution only happens after many many traverses through the tree. If you only do a few searches the bias of the nn is very high. This concept is very similiar to RAVE and/or uct which is commonly used in mcts....

In the alphaZero paper the lower the probabilites the following way. P(s,a)/1+N(s,a) where N is the number of traverses through the node and P the move probability.

Now give the candiate move 99 % probabilty and draw the picture yourself. Takes some iterations to lower the probability to encourage searching other moves
So many wrong things.
In the beginning of the training NN weights are random so output probabilities are random and uniform which gives exactly the shallowest possible tree.
Even for highly selective NN (later in training), in each node there are at least few best moves that have similar probability 0.5-0.7 (and A0 uses UCT policy that selects other candidates much more often than original UCB1 - which you wrote up even though you omitted sqrt of sum of total visit count). 0.99 never happens unless NN actually sees a mate in few moves, so your example is totally irrelevant.

I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already

Milos · Post by **Milos** » Wed Dec 13, 2017 10:58 pm

CheckersGuy wrote:I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already

First few iterations of what?
Previously you were talking about training games (coz only there you have 800MCTS per move in games against SF you have 80k).
If network is already fully trained NN is selective, but not nearly as much as you are suggesting.
When I said few moves 0.5-0.7, I meant best move has 0.5-0.7 others (other lets say 1-2 moves) would have 0.3-0.5 combined.
So realistic example for selective NN would be something like P = [0.6 0.25 0.1 0.02 0.01 0.01 0.01].
So already after 3 times selecting first move, fourth time second move would be selected, and 8th time third one, etc.
So lets say in each node is the same P vector and num of simulations is large (80k) and tree is already 19 deep at the longest path.
At first simulation you extend it to 20. Next time you'd extend it after roughly 1/0.6^21 = 46k simulations so it would be extended once more.
Other branches are proportionally shorter so for next move if your opponent played the best expected response you'd extend the depth for 1 to 22, otherwise you'd had a shorter longest branch that you'd extend for 1.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 11:06 pm

Milos wrote:
CheckersGuy wrote:I was obviously talking about the trained network and at the first few iterations the move probabilites provide a stronger bias. 0.99 may be a little high. As for a few moves having 0.5-07 probability I dont think so. Because the some of all the probabilites should be 1. If you had more than two moves with a probability of 0.5-0.7 this would be wrong already
First few iterations of what?
Previously you were talking about training games (coz only there you have 800MCTS per move in games against SF you have 80k).
If network is already fully trained NN is selective, but not nearly as much as you are suggesting.
When I said few moves 0.5-0.7, I meant best move has 0.5-0.7 others (other lets say 1-2 moves) would have 0.3-0.5 combined.
So realistic example for selective NN would be something like P = [0.6 0.25 0.1 0.02 0.01 0.01 0.01].
So already after 3 times selecting first move, fourth time second move would be selected, and 8th time third one, etc.

First few iterations of the algorithm obviously.

in each node there are at least few best moves that have similar probability 0.5-0.7

This is what you said. If you call the probabilites you gave above "similiar" to 0.5-0.7 than thats just bogus

As for your example. Sure you will visits other moves but those moves are more likely to be bad and therefore have a bad action value and won't be visited that often in subsequent iterations. Don't forget that the search isnt entirely based on the move probabilites but the value that gets backed up from leafNodes goes into the calculation as well

Milos · Post by **Milos** » Wed Dec 13, 2017 11:15 pm

CheckersGuy wrote:First few iterations of the algorithm obviously.

Which algorithm? Training algorithm iteration (since iteration is what is used in training in the paper) or MCTS iteration?

As for your example. Sure you will visits other moves but those moves are more likely to be bad and therefore have a bad action value and won't be visited that often in subsequent iterations. Don't forget that the search isnt entirely based on the move probabilites but the value that gets backed up from leafNodes goes into the calculation as well

See the edit in my previous post. Point is extending depth is not so easy as you originally suggested and even in the best branch tree was not very deep even in the games against SF.

trulses · Post by **trulses** » Wed Dec 13, 2017 11:38 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

You seem a bit confused, when people say there are 800 nodes per search they refer to the number of simulations per move, this should be fairly obvious if you read the paper.

The point still stands, with the MCTS hyper-parameters from A0 and a randomly initialized network you will typically find mate in one from the root position if your prior probabilities are roughly uniformly random.

To find a mate in one from the root node, you would have to be in a mate-in-one position. It's fairly obvious to me that most of those positions would not be in the early game, so your comment that these positions would be in the end game seems to be an indication that you're missing the point or you lack some understanding.

hgm · Post by **hgm** » Wed Dec 13, 2017 11:41 pm

I am not very familiar with MCTS or simulations of it. But surely a search that expands 800 nodes should be aware if one of the root moves leads to a mate, and then only consider that move? If not, it seems to me that much could be improved. It is hard to believe that a mate close to the root would not affect the move choice at the root at all.

If totally random play already results in 15% checkmates, any preferece for mate-in-1 moves could only drive up that number, as the purely random games that eded in a draw would surely contain positions where mate-in-1 was possible, but not played, and these would now all turn into wins.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 11:51 pm

hgm wrote:I am not very familiar with MCTS or simulations of it. But surely a search that expands 800 nodes should be aware if one of the root moves leads to a mate, and then only consider that move? If not, it seems to me that much could be improved. It is hard to believe that a mate close to the root would not affect the move choice at the root at all.

If totally random play already results in 15% checkmates, any preferece for mate-in-1 moves could only drive up that number, as the purely random games that eded in a draw would surely contain positions where mate-in-1 was possible, but not played, and these would now all turn into wins.

Yeah. Seems like a way to improve the search. However, DeepMind wasn't going for "the strongest possible chess engine" but for a system that can learn any learn. Therefore they didnt use any domain knowledge.

I would really like to know how one could speed up the training process using domain knowledge.

Milos · Post by **Milos** » Thu Dec 14, 2017 12:00 am

trulses wrote:The point still stands, with the MCTS hyper-parameters from A0

What are "MCTS hyper-parameters"? Never heard of it, can you point me to a reference where that is mentioned?

To find a mate in one from the root node, you would have to be in a mate-in-one position. It's fairly obvious to me that most of those positions would not be in the early game, so your comment that these positions would be in the end game seems to be an indication that you're missing the point or you lack some understanding.

The only thing that I get from your writing is that you have quite some difficulty to express yourself clearly in English. That really doesn't help in the discussion.

trulses · Post by **trulses** » Thu Dec 14, 2017 12:09 am

Milos wrote:
trulses wrote:The point still stands, with the MCTS hyper-parameters from A0
What are "MCTS hyper-parameters"? Never heard of it, can you point me to a reference where that is mentioned?

To find a mate in one from the root node, you would have to be in a mate-in-one position. It's fairly obvious to me that most of those positions would not be in the early game, so your comment that these positions would be in the end game seems to be an indication that you're missing the point or you lack some understanding.
The only thing that I get from your writing is that you have quite some difficulty to express yourself clearly in English. That really doesn't help in the discussion.

MCTS hyper-parameters refer to the hyper-parameters of the search like the value of virtual loss, the dirichlet noise alpha value, the c_puct exploration value etc. Even the number of search threads. These things help shape the resulting search tree.

Milos wrote:The only thing that I get from your writing is that you have quite some difficulty to express yourself clearly in English.

That's cute.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess