Google's AlphaGo team has been working on chess

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 3:50 pm

Milos wrote:
CheckersGuy wrote:
syzygy wrote:
hgm wrote:
syzygy wrote:But you seem to assume that AlphaZero's NN isn't worth much, or at least not worth more than 1000 nodes in SF's tree.
But that is not just an assumption, we kow that from a simple extrapolation, don't we? In fig. 2a of the AlphaZero paper we see that the Elo of AlphaZero drops below that of Stockfish at ~300ms/move (i.e. 24k AlphaZero nodes), and is some 120 Elo worse at 40ms/move (3.2k nodes). That is 120 Elo for a factor 8 reduction in tree size, and there are early 4 more factors 8 to go before we are at a tree size of a single node. By that time Stockfish' 1000 nodes would be about 600 Elo stronger than AlphaZero's single node. If we assume 70 Elo per doubling for Stockfish, we could reduce its tree size to about 2-3 nodes to get it on par with AlphaZero's single node.

And that makes sense. I don't believe a NN could be much more accurate than QS, when the situation gets complex.
QS is an interesting point. It seems AlphaZero doesn't care whether the position being expanded and NN-evaluated is a quiet position or not.

But the QS point also shows that an evaluation function only works with a minimum of search. The evaluation function can know a lot about passed pawns, but if a relatively simple combination wins the pawn, that knowledge is not worth much. For the same reason I do not yet rule out that the strength of AlphaZero's evaluation function becomes more apparent as its search takes care of the immediate tactics that its NN cannot grasp.
I think you have something mixedup. The tree is only built up once per game and this tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

If you look at the explanation on wikipedia this is actually pretty clear
Sorry, but your the one who mixed it up.
Play (Figure 2d). At the end of the search AlphaGo Zero selects a move a to play in the root position s0, proportional to its exponentiated visit count, pi(a|s0) = N(s0,a)^1/τ / Sum(N(s0,b)^1/τ), where τ is a temperature parameter that controls the level of exploration. The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.

Tree tree is still not kept in memory once the game is over. So where is the bookLearning ?

As for the 8 step history. Where do you get that the 8 step history is equivalent to an 8 ply search ?

Milos · Post by **Milos** » Wed Dec 13, 2017 4:46 pm

CheckersGuy wrote:Tree tree is still not kept in memory once the game is over. So where is the bookLearning ?

I addressed your obviously wrong comment:

tree won't be kept in the memory after the root position changes (your opponent or yourself made a move).

not if training weights is in some way analogues to book learning (which I believe it is coz weights learned in training contain information about best move in certain positions the same what book does, just much less accurate).

As for the 8 step history. Where do you get that the 8 step history is equivalent to an 8 ply search ?

I didn't, it's like asking where did you get that fixed depth alpha-beta search could perform like NN. It was an assumption based on the what is the horizon of NN evaluation, how deep it can see tactics. Since NN is clearly not very efficient way to do the evaluation, we kind of came to the new conclusion that its strength is probably not more than of just qsearch extrapolating performance from Fig.2 of the paper.

Henk · Post by **Henk** » Wed Dec 13, 2017 6:16 pm

At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 7:07 pm

Henk wrote:At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.

Why do you think that all of those games will end in the 50 rule move ? Some of those will be losses and (very) few will be wins. If the ai learns that a given line leads to a draw it might try something else that either works or doesnt.

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 7:18 pm

CheckersGuy wrote:
Henk wrote:At the start of training only random moves will be played. So that means all games will end in a fifty move draw. So how do they get the move probability distribution right in the very first stage of training.

Or did they use some end game knowledge.
Why do you think that all of those games will end in the 50 rule move ? Some of those will be losses and (very) few will be wins. If the ai learns that a given line leads to a draw it might try something else that either works or doesnt.

I am not sure where the loss-win asymmetry comes from, or even what it means in the context of an engine playing itself.

A few years ago (I believe it was 2013) I trained a neural network that would compute an evaluation function for Spanish checkers, starting from random moves and using reinforcement learning to learn. This worked very well, but I did have to implement smallish (6-men) EGTBs for the process to work well, because the search I was using to generate games wasn't strong enough to discover some important facts about how much advantage is enough advantage to win the game.

hgm · Post by **hgm** » Wed Dec 13, 2017 7:18 pm

I would think there are as may losses as wins, in self-play.

I dug up a posting from the late Steven Edwards, on random games:

sje wrote:Random game mating probabilities

Some data from 24,478,109 games, each made from randomly generated moves:

There were 3,747,489 checkmates (15.31%)
Of the checkmates, 1,872,426 were White getting checkmated (49.96%).
Of the checkmates, 1,875,063 were Black getting checkmated (50.04%).

There were 1,499,382 stalemates (6.13%)
Of the stalemates, 754,025 were White getting stalemated (50.29%).
Of the stalemates, 745,357 were Black getting stalemated (49.71%).

Ad another one:

sje wrote:One billion random games:

Code: Select all

0.153051   checkmate
0.193435   fiftymoves
0.56713   insufficient
0.0251883   repetition
0.0611956   stalemate

mean length&#58; 334.354
limit&#58; 1000000000
usage&#58; 185920
frequency&#58; 5378.64
period&#58; 0.00018592

15% wins is a good starting point. The NN would probably quickly develop a preferece for pushimg Paws, as promotions would give it more material and thus a larger chace to accidetally checkmate. This tendecy would strongly suppress the 50-move draws in positions that aren't really draws.

trulses · Post by **trulses** » Wed Dec 13, 2017 7:31 pm

You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.

Milos · Post by **Milos** » Wed Dec 13, 2017 8:41 pm

trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.

It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

CheckersGuy · Post by **CheckersGuy** » Wed Dec 13, 2017 8:48 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.
There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

Currently we can not make many assumption about how "deep" AlphaZero evaluates. This completly depends on how well the neural network can predict the move probabilites. If in every position the NN gave (almost) 50% probility for the two candidate moves you can get quite deep searches

AlvaroBegue · Post by **AlvaroBegue** » Wed Dec 13, 2017 9:19 pm

Milos wrote:
trulses wrote:You're not playing random moves, you're playing the moves given by a random search tree. At 800 nodes per search you typically will find mate in ones and play them. The quality of this play is much higher than just picking a random move.

This is under the assumption that your prior probabilities aren't extremely biased, which they won't be with proper initialization.
It seems you got it wrong.

No, he is precisely correct.

There are no 800 nodes per search, there is 1 evaluated and few tens traversed nodes per search (because number of different paths explored is large). Reached depth of single MCT search would than be typically smaller than what SF achieves, and only in very late endgame you'd reach mates.

800 nodes are expanded per search. Or you can say that there are 800 playouts through the MCTS tree per search.

In any case, his comment about the quality of play being better than random moves does apply: If there is an immediate mate available, it will be found. That's enough to get the process bootstrapped, because starting from random evaluation you'll learn that positions with a bunch of white pieces on top of the black king tend to be white victories. Then you will play games where both players are trying to place a bunch of pieces on top of the enemy king, which will be of much much higher quality than the initial games. Etc.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess