Google's AlphaGo team has been working on chess

Milos · Post by **Milos** » Wed Dec 13, 2017 3:16 pm

Daniel Shawul wrote:There is not really a big need to use big shallow search depth searches like 8, which btw is pretty bad if you expand right after each visit. You will waste a lot of time scoring inferior moves with a shallow depth search of 8. With alpha-beta we use just qsearch to make the eval quiet, and i think we can use the same approach in MCTS but we really need to expand faster with just qsearch. Also expand after each visit makes sense because now we have a determinsitic eval unlike in the previous case when rollouts were used.

I initially thought that 8 would be a good number since Alpha0 NN has also "depth 8" in its weights, but after Harm pointed to Fig. 2, I realised that even super shallow alpha-beta offers much superior evaluation than NN, that practically only QS could be sufficient.
Expand after each visit also is the only thing that makes sense if one doesn't do rollouts, coz running eval of the same position multiple times is just one huge loss and totally pointless.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 3:27 pm

syzygy wrote:
hgm wrote:
syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.
But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.
QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.

I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

If you look at the explanation on wikipedia this is actually pretty clear

Milos wrote:
Daniel Shawul wrote:There is not really a big need to use big shallow search depth searches like 8, which btw is pretty bad if you expand right after each visit. You will waste a lot of time scoring inferior moves with a shallow depth search of 8. With alpha-beta we use just qsearch to make the eval quiet, and i think we can use the same approach in MCTS but we really need to expand faster with just qsearch. Also expand after each visit makes sense because now we have a determinsitic eval unlike in the previous case when rollouts were used.
I initially thought that 8 would be a good number since Alpha0 NN has also "depth 8" in its weights, but after Harm pointed to Fig. 2, I realised that even super shallow alpha-beta offers much superior evaluation than NN, that practically only QS could be sufficient.
Expand after each visit also is the only thing that makes sense if one doesn't do rollouts, coz running eval of the same position multiple times is just one huge loss and totally pointless.

Where do you have the "depth8" figure from ? There is nothing in the paper about that.

Milos · Post by **Milos** » Wed Dec 13, 2017 3:32 pm

CheckersGuy wrote:
syzygy wrote:
hgm wrote:
syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.
But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.
QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.
I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

If you look at the explanation on wikipedia this is actually pretty clear

Sorry, but your the one who mixed it up.

Play (Figure 2d). At the end of the search AlphaGo Zero selects a move a to play in the root position s0, proportional to its exponentiated visit count, pi(a|s0) = N(s0,a)^1/τ / Sum(N(s0,b)^1/τ), where τ is a temperature parameter that controls the level of exploration. The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.

Milos · Post by **Milos** » Wed Dec 13, 2017 3:35 pm

CheckersGuy wrote:Where do you have the "depth8" figure from ? There is nothing in the paper about that.

Again, you need to read the papers carefully (AGZ Nature and latest AZ one) coz you keep missing things.

Table S1: Input features used by AlphaZero in Go, Chess and Shogi respectively. The ﬁrst set of features are repeated for each position in a T = 8-step history.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 3:50 pm

Milos wrote:
CheckersGuy wrote:
syzygy wrote:
hgm wrote:
syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.
But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.
QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.
I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

If you look at the explanation on wikipedia this is actually pretty clear
Sorry, but your the one who mixed it up.
Play (Figure 2d). At the end of the search AlphaGo Zero selects a move a to play in the root position s0, proportional to its exponentiated visit count, pi(a|s0) = N(s0,a)^1/τ / Sum(N(s0,b)^1/τ), where τ is a temperature parameter that controls the level of exploration. The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.

Tree tree is still not kept in memory once the game is over. So where is the bookLearning ?

As for the 8 step history. Where do you get that the 8 step history is equivalent to an 8 ply search ?

Milos · Post by **Milos** » Wed Dec 13, 2017 4:46 pm

CheckersGuy wrote:Tree tree is still not kept in memory once the game is over. So where is the bookLearning ?

I addressed your obviously wrong comment:

tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

not if training weights is in some way analogues to book learning (which I believe it is coz weights learned in training contain information about best move in certain positions the same what book does, just much less accurate).

As for the 8 step history. Where do you get that the 8 step history is equivalent to an 8 ply search ?

I didn't, it's like asking where did you get that fixed depth alpha-beta search could perform like NN. It was an assumption based on the what is the horizon of NN evaluation, how deep it can see tactics. Since NN is clearly not very efficient way to do the evaluation, we kind of came to the new conclusion that its strength is probably not more than of just qsearch extrapolating performance from Fig.2 of the paper.

Henk · Post by **Henk** » Wed Dec 13, 2017 6:16 pm

At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 7:07 pm

Henk wrote:At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.

Why do you think that all of those games will end in the 50 rule move ? Some of those will be losses and (very) few will be wins. If the ai learns that a given line leads to a draw it might try something else that either works or doesnt.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 7:18 pm

CheckersGuy wrote:
Henk wrote:At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.
Why do you think that all of those games will end in the 50 rule move ? Some of those will be losses and (very) few will be wins. If the ai learns that a given line leads to a draw it might try something else that either works or doesnt.

I am not sure where the loss-win asymmetry comes from, or even what it means in the context of an engine playing itself.

A few years ago (I believe it was 2013) I trained a neural network that would compute an evaluation function for Spanish checkers, starting from random moves and using reinforcement learning to learn. This worked very well, but I did have to implement smallish (6-men) EGTBs for the process to work well, because the search I was using to generate games wasn't strong enough to discover some important facts about how much advantage is enough advantage to win the game.

hgm · Post by **hgm** » Wed Dec 13, 2017 7:18 pm

I would think there are as may losses as wins, in self-play.

I dug up a posting from the late Steven Edwards, on random games:

sje wrote:Random game mating probabilities

Some data from 24,478,109 games, each made from randomly generated moves:

There were 3,747,489 checkmates (15.31%)
Of the checkmates, 1,872,426 were White getting checkmated (49.96%).
Of the checkmates, 1,875,063 were Black getting checkmated (50.04%).

There were 1,499,382 stalemates (6.13%)
Of the stalemates, 754,025 were White getting stalemated (50.29%).
Of the stalemates, 745,357 were Black getting stalemated (49.71%).

Ad another one:

sje wrote:One billion random games:

Code: Select all

0.153051   checkmate
0.193435   fiftymoves
0.56713   insufficient
0.0251883   repetition
0.0611956   stalemate

mean length&#58; 334.354
limit&#58; 1000000000
usage&#58; 185920
frequency&#58; 5378.64
period&#58; 0.00018592

15% wins is a good starting point. The NN would probably quickly develop a preferece for pushimg Paws, as promotions would give it more material and thus a larger chace to accidetally checkmate. This tendecy would strongly suppress the 50-move draws in positions that aren't really draws.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess