Google's AlphaGo team has been working on chess

nionita · Post by **nionita** » Thu Dec 14, 2017 12:16 am

hgm wrote:Note that the AlphaZero's NN is trained to predict the visiting frequencies the nodes will get in an MCTS. That doesn't necessarily correspond to the evaluation score of the moves. Moves with complex tactics will need a lot of visits before they are resolved one way or the other, because the MCTS will need to construct a large QS tree. And it is very possible the final outcome is that the move is bad, but you could not have seen that with just a few visits.

I think this is a very good point! In classical engines we do not express this idea, but just relay on good move sorting to reach to positions which might be promising. But our evaluation does not give any information about how uncertain the static evaluation of the current position is, so that we could spend more nodes for this position. The only (primitive) way we can say if we should stop the search is how much under alpha we are (and of course in QS when we reach a quiet position).

We would need an eval function which returns the static value + a confidence interval for it (or something similar). This is what I tried in the beginning (I named this multi dimensional score) but I cold not find a way to integrate this with the alpha beta search, which needs a total order relation between scores.

trulses · Post by **trulses** » Thu Dec 14, 2017 12:18 am

hgm wrote:I am not very familiar with MCTS or simulations of it. But surely a search that expands 800 nodes should be aware if one of the root moves leads to a mate, and then only consider that move? If not, it seems to me that much could be improved. It is hard to believe that a mate close to the root would not affect the move choice at the root at all.

If totally random play already results in 15% checkmates, any preferece for mate-in-1 moves could only drive up that number, as the purely random games that eded in a draw would surely contain positions where mate-in-1 was possible, but not played, and these would now all turn into wins.

It's possible that even with 800 simulations some moves in the root node are not explored. This depends on how you do node selection, the hyper-parameters of the search, and the prior probabilities.

However, if you initialize the NN correctly in the beginning the prior probabilities for any position will look like uniform random noise. This leads to the widest possible and shallowest possible search tree. This does indeed find mate in ones and it does recommend them.

Milos · Post by **Milos** » Thu Dec 14, 2017 12:33 am

trulses wrote:It's possible that even with 800 simulations some moves in the root node are not explored. This depends on how you do node selection, the hyper-parameters of the search, and the prior probabilities.

However, if you initialize the NN correctly in the beginning the prior probabilities for any position will look like uniform random noise. This leads to the widest possible and shallowest possible search tree. This does indeed find mate in ones and it does recommend them.

Ok, finally after 4 posts I understood what you are trying to tell.
The claims are really quite trivial, so probably I was trying to load more into them then there was.
With uniform random prior probabilities you'd probably even find mate-in-3 in 800 simulations coz there are only few possible moves once you approach mate.

hgm · Post by **hgm** » Thu Dec 14, 2017 9:12 am

CheckersGuy wrote:Yeah. Seems like a way to improve the search. However, DeepMind wasn't going for "the strongest possible chess engine" but for a system that can learn any learn. Therefore they didnt use any domain knowledge.

The ban on domain knowledge did not extend to the rules. And what is a checkmates belongs to the rules. This is about the completely general issue of how the MCTS reacts to stumbling on a node that it judges to be a certain win (and cannot expand to chage that conclusion).

I would really like to know how one could speed up the training process using domain knowledge.

I would really love to give that a try. Write a MCTS program, and equip it with a comparatively simple NN, which you offer lots of domain-specific information you know to be useful in Chess. Like all attackers and protectors of every square, slider attacks that are blocked by a single piece, which moves would attack a given square, etc. And the try to train that NN.

Pio · Post by **Pio** » Thu Dec 14, 2017 10:47 am

I would love to see that engine. I think you would make a great one

Henk · Post by **Henk** » Tue Dec 19, 2017 9:49 pm

Article says that they use a neural network that takes the board position
s as an input and outputs a vector of move probabilities p.

Does that mean that output vector contains 4,672 elements ?
For how does it know which move probablity belongs to which move ?

Robert Pope · Post by **Robert Pope** » Tue Dec 19, 2017 11:03 pm

Henk wrote:Article says that they use a neural network that takes the board position
s as an input and outputs a vector of move probabilities p.

Does that mean that output vector contains 4,672 elements ?
For how does it know which move probablity belongs to which move ?

Right - the vector of probabilties has every combination of possible moves in it, even ones that aren't even pseudolegal in the position.

trulses · Post by **trulses** » Wed Dec 20, 2017 12:38 pm

Henk wrote:Does that mean that output vector contains 4,672 elements?

Yeah, but they filter out the illegal moves and re-normalize so at the end you have a probability distribution over only the legal moves.

Henk wrote:For how does it know which move probablity belongs to which move ?

It's just an encoding you choose when you set up the net. For instance in a dog/cat classifier you could have two outputs where you decide that element 0 means probability of cat and element 1 means probability of dog.

The only thing that really matters is consistency in how you label things and how you interpret the output.

Henk · Post by **Henk** » Wed Dec 20, 2017 2:38 pm

4,672 output elements. Looks a bit terrible expensive.

I remember implementing a perceptron with only one bit as output.

In my simple network a forward pass is already a computational bottleneck.
Image how costly a pass in their network must be.

Also input contains the 8 chess positions of last eight plies. So input is enormous too.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 20, 2017 2:57 pm

Henk wrote:4,672 output elements. Looks a bit terrible expensive.

[...]

Also input contains the 8 chess positions of last eight plies. So input is enormous too.

The outputs are 73 planes of size 8x8. The 8 previous chess positions are something like 96 planes of size 8x8. If the architecture they used is similar to the one in AlphaGo Zero, the hidden layers consist of 256 planes of size 8x8, and there are lots of them. So you wouldn't save much computation by trimming down the inputs or the outputs.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess