Google's AlphaGo team has been working on chess

Henk · Post by **Henk** » Wed Dec 20, 2017 4:17 pm

I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 20, 2017 4:28 pm

Henk wrote:I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?

First let's explain where the 73 comes from, in case anyone in this conversation doesn't know. Queens can move in 56 different ways (8 directions and up to 7 steps in each direction). This also contains the ways kings, bishops, rooks and pawns move. Then there are 8 ways knight move. Finally, we can under-promote in 9 ways ({capture left, capture right, move straight} x {promote to R, promote to B, promote to N}). So now we encode a move as the `from' square (64 possibilities) and a number from 1 to 73, indicating the way the piece is moving.

The neural network comes up with 73 planes of size 8x8, where each entry is a single real number. Now we generate the legal moves (by conventional means) for each move we look up the value assigned to it by the neural network, and we say the probability of the move is proportional to exp(value). Now these probabilities don't add up to 1, so you need to re-scale.

probability(m) = exp(value[m]) / sum_i(exp(value))

This operation is called SoftMax in the machine-learning world.

Henk · Post by **Henk** » Wed Dec 20, 2017 5:00 pm

Ok. Clear.

pilgrimdan · Post by **pilgrimdan** » Thu Dec 21, 2017 10:18 am

AlvaroBegue wrote:
Henk wrote:I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?
First let's explain where the 73 comes from, in case anyone in this conversation doesn't know. Queens can move in 56 different ways (8 directions and up to 7 steps in each direction). This also contains the ways kings, bishops, rooks and pawns move. Then there are 8 ways knight move. Finally, we can under-promote in 9 ways ({capture left, capture right, move straight} x {promote to R, promote to B, promote to N}). So now we encode a move as the `from' square (64 possibilities) and a number from 1 to 73, indicating the way the piece is moving.

The neural network comes up with 73 planes of size 8x8, where each entry is a single real number. Now we generate the legal moves (by conventional means) for each move we look up the value assigned to it by the neural network, and we say the probability of the move is proportional to exp(value). Now these probabilities don't add up to 1, so you need to re-scale.

probability(m) = exp(value[m]) / sum_i(exp(value))

This operation is called SoftMax in the machine-learning world.

it sounds like to me ... that this takes a lot of memory to do...

how much memory is needed to do all of this stuff...

Henk · Post by **Henk** » Thu Dec 21, 2017 11:07 am

syzygy wrote:
CheckersGuy wrote:Additionally, I am against the term "book-building-approach". It's not like A0 zero stores the best move in a given position.
But I am looking at the (so-called MC) tree it builds, starting with a 1-node tree and growing it by successively expanding leaf nodes (whether by NN or some other means). That is how automatic book building works.

See e.g. Thomas Lincke's dissertation. Section 3.4 considers expansion strategies. The rule he came up with gives a high expansion priority to a move that is good or has a shallow subbook. It seems he was not aware of the multi-armed bandit theory.

PUCT:
http://www.ecmlpkdd2013.org/wp-content/ ... 07/456.pdf

Very easy.

AlvaroBegue · Post by **AlvaroBegue** » Thu Dec 21, 2017 11:09 am

pilgrimdan wrote:
AlvaroBegue wrote:
Henk wrote:I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?
First let's explain where the 73 comes from, in case anyone in this conversation doesn't know. Queens can move in 56 different ways (8 directions and up to 7 steps in each direction). This also contains the ways kings, bishops, rooks and pawns move. Then there are 8 ways knight move. Finally, we can under-promote in 9 ways ({capture left, capture right, move straight} x {promote to R, promote to B, promote to N}). So now we encode a move as the `from' square (64 possibilities) and a number from 1 to 73, indicating the way the piece is moving.

The neural network comes up with 73 planes of size 8x8, where each entry is a single real number. Now we generate the legal moves (by conventional means) for each move we look up the value assigned to it by the neural network, and we say the probability of the move is proportional to exp(value). Now these probabilities don't add up to 1, so you need to re-scale.

probability(m) = exp(value[m]) / sum_i(exp(value))

This operation is called SoftMax in the machine-learning world.

it sounds like to me ... that this takes a lot of memory to do...

how much memory is needed to do all of this stuff...

Do you mean memory or storage for the training database? I can make guesses about both, but I am not sure what your concern is.

pilgrimdan · Post by **pilgrimdan** » Thu Dec 21, 2017 12:59 pm

AlvaroBegue wrote:
pilgrimdan wrote:
AlvaroBegue wrote:
Henk wrote:I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?
First let's explain where the 73 comes from, in case anyone in this conversation doesn't know. Queens can move in 56 different ways (8 directions and up to 7 steps in each direction). This also contains the ways kings, bishops, rooks and pawns move. Then there are 8 ways knight move. Finally, we can under-promote in 9 ways ({capture left, capture right, move straight} x {promote to R, promote to B, promote to N}). So now we encode a move as the `from' square (64 possibilities) and a number from 1 to 73, indicating the way the piece is moving.

The neural network comes up with 73 planes of size 8x8, where each entry is a single real number. Now we generate the legal moves (by conventional means) for each move we look up the value assigned to it by the neural network, and we say the probability of the move is proportional to exp(value). Now these probabilities don't add up to 1, so you need to re-scale.

probability(m) = exp(value[m]) / sum_i(exp(value))

This operation is called SoftMax in the machine-learning world.

it sounds like to me ... that this takes a lot of memory to do...

how much memory is needed to do all of this stuff...

Do you mean memory or storage for the training database? I can make guesses about both, but I am not sure what your concern is.

okay ... yes ... storage is what I had in mine ...

AlvaroBegue · Post by **AlvaroBegue** » Thu Dec 21, 2017 2:14 pm

pilgrimdan wrote:
AlvaroBegue wrote:
pilgrimdan wrote:
AlvaroBegue wrote:
Henk wrote:I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?
First let's explain where the 73 comes from, in case anyone in this conversation doesn't know. Queens can move in 56 different ways (8 directions and up to 7 steps in each direction). This also contains the ways kings, bishops, rooks and pawns move. Then there are 8 ways knight move. Finally, we can under-promote in 9 ways ({capture left, capture right, move straight} x {promote to R, promote to B, promote to N}). So now we encode a move as the `from' square (64 possibilities) and a number from 1 to 73, indicating the way the piece is moving.

The neural network comes up with 73 planes of size 8x8, where each entry is a single real number. Now we generate the legal moves (by conventional means) for each move we look up the value assigned to it by the neural network, and we say the probability of the move is proportional to exp(value). Now these probabilities don't add up to 1, so you need to re-scale.

probability(m) = exp(value[m]) / sum_i(exp(value))

This operation is called SoftMax in the machine-learning world.

it sounds like to me ... that this takes a lot of memory to do...

how much memory is needed to do all of this stuff...

Do you mean memory or storage for the training database? I can make guesses about both, but I am not sure what your concern is.

okay ... yes ... storage is what I had in mine ...

You don't need to store all that. If you have the result of the game and the list of moves, you can build the input and the desired outputs on the fly.

In the paper they indicate that they also store the distribution of playouts amongst the moves, since they use that as labels for the policy output. That would require an additional ~100 bytes per move.

pilgrimdan · Post by **pilgrimdan** » Thu Dec 21, 2017 2:21 pm

AlvaroBegue wrote:
pilgrimdan wrote:
AlvaroBegue wrote:
pilgrimdan wrote:
AlvaroBegue wrote:
Henk wrote:I still don't understand. How do you get from 73 planes of size 8x8
a probability for each possible move ?
First let's explain where the 73 comes from, in case anyone in this conversation doesn't know. Queens can move in 56 different ways (8 directions and up to 7 steps in each direction). This also contains the ways kings, bishops, rooks and pawns move. Then there are 8 ways knight move. Finally, we can under-promote in 9 ways ({capture left, capture right, move straight} x {promote to R, promote to B, promote to N}). So now we encode a move as the `from' square (64 possibilities) and a number from 1 to 73, indicating the way the piece is moving.

The neural network comes up with 73 planes of size 8x8, where each entry is a single real number. Now we generate the legal moves (by conventional means) for each move we look up the value assigned to it by the neural network, and we say the probability of the move is proportional to exp(value). Now these probabilities don't add up to 1, so you need to re-scale.

probability(m) = exp(value[m]) / sum_i(exp(value))

This operation is called SoftMax in the machine-learning world.

it sounds like to me ... that this takes a lot of memory to do...

how much memory is needed to do all of this stuff...

Do you mean memory or storage for the training database? I can make guesses about both, but I am not sure what your concern is.

okay ... yes ... storage is what I had in mine ...

You don't need to store all that. If you have the result of the game and the list of moves, you can build the input and the desired outputs on the fly.

In the paper they indicate that they also store the distribution of playouts amongst the moves, since they use that as labels for the policy output. That would require an additional ~100 bytes per move.

okay ... thanks ...

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 26, 2017 4:46 pm

mcostalba wrote:I have read the paper: result is impressive!

Honestly I didn't think it was possible because my understanding was that chess is more "computer friendly" than Go....I was wrong.

It is true, SF is not meant to play at its best without a book and especially 1 fixed minute per move cuts out the whole time management, it would be more natural to play with tournament conditions, but nevertheless I think these are secondary aspects, what has been accomplished is huge.

Marco, A0 did not win a match against SF. A0 with RL won a match against SF. Or said another way, A0 won a match against SF because SF does not have RL. Or thought of a different way, a group of programmers identified a deficiency that exist in a competitive field and took advantage of that deficiency by eliminating that deficiency in their entity. Or one can change that thought around and say RL does not belong in competitive chess because it covers up the underlying strength and correctness of the algorithm. In that case the A0 vs SF match is non sequitur and meaningless. Then there is the thought of the fan that wants RL but are ignored because they are not important and what the fan thinks or wants is not meaningful.

But, what you can't say is, "what has been accomplished is huge" in terms of a chess playing algorithm. You might say that what A0 has demonstrated in go, chess and shogi has accomplished a huge demonstration that a NN with RL may conquer hamanity some day. I won't argue against that. Concerning chess though the AB algorithm is not inferior to NN+MC. It is inferior to NN+MC+RL. AB+RL is far superior to NN+MC+RL.

And I said all that without mentioning RomiChess not even one time!

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess