Alpha Zero vs Stockfish 8 tournament conditions.

Dariusz Orzechowski · Sun Dec 10, 2017 12:43 am

syzygy wrote:I wonder why they don't use MCTS also to calculate a more accurate expected game outcome and use that for adjusting the weights instead of z. After all, the final game outcome depends on the choices made later in the game and may not be the best reflection of the winning chances at position t.

They use MCTS but instead of MC rollouts they call a network and take v_t as a game outcome prediction. In previous versions they used both rollouts and v_t (taken from separate "value" network, now it's the same network for both p_t and v_t).

syzygy wrote:Btw, what do they mean exactly by "move probability"? The probability that a move is played is a circular definition. I suppose the more "probable" moves are supposed to be the better moves?

It's basically just winning probability after playing a move (normalized). Or probability that a move is a best one in a given position. In Go it has nice representation as a heatmap on the board.

syzygy · Post by **syzygy** » Sun Dec 10, 2017 12:47 am

Jesse Gersenson wrote:
Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks
That seems to describe the data set they generate and then work from. I don't know what "mini-batches of size 4,096" means, but 4096 x 700000 = 2,867,200,000.

I think each mini-batch corresponds to 4096 moves.

syzygy · Post by **syzygy** » Sun Dec 10, 2017 1:03 am

Dariusz Orzechowski wrote:
syzygy wrote:I wonder why they don't use MCTS also to calculate a more accurate expected game outcome and use that for adjusting the weights instead of z. After all, the final game outcome depends on the choices made later in the game and may not be the best reflection of the winning chances at position t.
They use MCTS but instead of MC rollouts they call a network and take v_t as a game outcome prediction. In previous versions they used both rollouts and v_t (taken from separate "value" network, now it's the same network for both p_t and v_t).

OK, so "leaf position" does not mean "terminal position".

Do I understand this correctly:
- The MCTS starts with a tree with the current (root) position as a single node.
- Each simulation traverses the tree until a leaf node is reached. This leaf node is expanded and evaluated with the neural network.
- The move probabilities outputted by the neural network are used to choose paths through the tree for the next simulation.

Milos · Post by **Milos** » Sun Dec 10, 2017 1:07 am

syzygy wrote:OK, so "leaf position" does not mean "terminal position".

Do I understand this correctly:
- The MCTS starts with a tree with the current (root) position as a single node.
- Each simulation traverses the tree until a leaf node is reached. This leaf node is expanded and evaluated with the neural network.
- The move probabilities outputted by the neural network are used to choose paths through the tree for the next simulation.

Yes you are correct. Just look at the figure 2 of AlphaGoZero Nature paper.
MCTS is actually standard UCT just without rollouts.

Dariusz Orzechowski · Post by **Dariusz Orzechowski** » Sun Dec 10, 2017 1:50 am

I see Milos answered already. If I may suggest something, read mentioned Nature paper, i.e. "Mastering the Game of Go without Human Knowledge", it's really good. Unlike current paper which looks frankly like a draft from a scrapbook, is painfully lacking on details and is almost void of any scientific rigour.

pilgrimdan · Post by **pilgrimdan** » Sun Dec 10, 2017 2:02 am

Jesse Gersenson wrote:
syzygy wrote:
syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.
Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf

from the above pdf ...

"Instead of analpha-beta search with domain-speciﬁc enhancements, AlphaZero uses agenera lpurpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulatedgamesofself-playthattraverseatreefromroot sroot toleaf. Eachsimulationproceedsby selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state."

syzygy · Post by **syzygy** » Sun Dec 10, 2017 2:13 am

Dariusz Orzechowski wrote:I see Milos answered already. If I may suggest something, read mentioned Nature paper, i.e. "Mastering the Game of Go without Human Knowledge", it's really good. Unlike current paper which looks frankly like a draft from a scrapbook, is painfully lacking on details and is almost void of any scientific rigour.

Yes, the Nature paper helped me to understand that I was misreading a few things.

It would be very interesting to know to roughly what Stockfish search depth AlphaZero's evaluation corresponds. If the real trick (for chess) is the MCTS and the parallelism it allows, then that can be replicated with Stockfish.

Milos · Post by **Milos** » Sun Dec 10, 2017 2:22 am

syzygy wrote:
Dariusz Orzechowski wrote:I see Milos answered already. If I may suggest something, read mentioned Nature paper, i.e. "Mastering the Game of Go without Human Knowledge", it's really good. Unlike current paper which looks frankly like a draft from a scrapbook, is painfully lacking on details and is almost void of any scientific rigour.
Yes, the Nature paper helped me to understand that I was misreading a few things.

It would be very interesting to know to roughly what Stockfish search depth AlphaZero's evaluation corresponds. If the real trick (for chess) is the MCTS and the parallelism it allows, then that can be replicated with Stockfish.

I believe sweet number is around 8. Depth 8 mpv search with 2-3 root moves takes around 10ms on a modern Intel core.
So an easy test would be single core match of 800s/move MC SF vs. 1s/move standard SF.

Jesse Gersenson · Post by **Jesse Gersenson** » Sun Dec 10, 2017 1:21 pm

Dariusz Orzechowski wrote:"Mastering the Game of Go without Human Knowledge"

https://deepmind.com/documents/119/agz_ ... nature.pdf

I believe sweet number is around 8. Depth 8 mpv search with 2-3 root moves takes around 10ms on a modern Intel core.
So an easy test would be single core match of 800s/move MC SF vs. 1s/move standard SF.

Hi Milos,
Could you explain this a little. I don't understand the assumption which concludes 800s vs 1s is reasonable.

Three root moves, each reaching depth 8, takes 10ms. That I understand. But the relationship between MC and SF's regular search I don't understand (perhaps because I don't know how SF search works).

Jesse Gersenson · Post by **Jesse Gersenson** » Sun Dec 10, 2017 5:01 pm

syzygy wrote:
Jesse Gersenson wrote:
Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks
That seems to describe the data set they generate and then work from. I don't know what "mini-batches of size 4,096" means, but 4096 x 700000 = 2,867,200,000.
I think each mini-batch corresponds to 4096 moves.

Mini-batches are 4096 positions. See p. 10 of the Nature pdf.

Over the course of training, 29 million games of self-play were generated. Parameters were updated from 3.1 million mini-batches of 2,048 positions each.

Alpha Zero vs Stockfish 8 tournament conditions.

Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.