Alpha Zero vs Stockfish 8 tournament conditions.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Alpha Zero vs Stockfish 8 tournament conditions.

Poll ended at Fri Dec 08, 2017 5:15 am

The time per move and hardware etc was fair.
27
52%
Google set it up to give Alpha Zero an edge.
25
48%
 
Total votes: 52

Dariusz Orzechowski
Posts: 44
Joined: Thu May 02, 2013 5:23 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Dariusz Orzechowski »

syzygy wrote:I wonder why they don't use MCTS also to calculate a more accurate expected game outcome and use that for adjusting the weights instead of z. After all, the final game outcome depends on the choices made later in the game and may not be the best reflection of the winning chances at position t.
They use MCTS but instead of MC rollouts they call a network and take v_t as a game outcome prediction. In previous versions they used both rollouts and v_t (taken from separate "value" network, now it's the same network for both p_t and v_t).
syzygy wrote:Btw, what do they mean exactly by "move probability"? The probability that a move is played is a circular definition. I suppose the more "probable" moves are supposed to be the better moves?
It's basically just winning probability after playing a move (normalized). Or probability that a move is a best one in a given position. In Go it has nice representation as a heatmap on the board.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

Jesse Gersenson wrote:
Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks
That seems to describe the data set they generate and then work from. I don't know what "mini-batches of size 4,096" means, but 4096 x 700000 = 2,867,200,000.
I think each mini-batch corresponds to 4096 moves.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

Dariusz Orzechowski wrote:
syzygy wrote:I wonder why they don't use MCTS also to calculate a more accurate expected game outcome and use that for adjusting the weights instead of z. After all, the final game outcome depends on the choices made later in the game and may not be the best reflection of the winning chances at position t.
They use MCTS but instead of MC rollouts they call a network and take v_t as a game outcome prediction. In previous versions they used both rollouts and v_t (taken from separate "value" network, now it's the same network for both p_t and v_t).
OK, so "leaf position" does not mean "terminal position".

Do I understand this correctly:
- The MCTS starts with a tree with the current (root) position as a single node.
- Each simulation traverses the tree until a leaf node is reached. This leaf node is expanded and evaluated with the neural network.
- The move probabilities outputted by the neural network are used to choose paths through the tree for the next simulation.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Milos »

syzygy wrote:OK, so "leaf position" does not mean "terminal position".

Do I understand this correctly:
- The MCTS starts with a tree with the current (root) position as a single node.
- Each simulation traverses the tree until a leaf node is reached. This leaf node is expanded and evaluated with the neural network.
- The move probabilities outputted by the neural network are used to choose paths through the tree for the next simulation.
Yes you are correct. Just look at the figure 2 of AlphaGoZero Nature paper.
MCTS is actually standard UCT just without rollouts.
Dariusz Orzechowski
Posts: 44
Joined: Thu May 02, 2013 5:23 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Dariusz Orzechowski »

I see Milos answered already. If I may suggest something, read mentioned Nature paper, i.e. "Mastering the Game of Go without Human Knowledge", it's really good. Unlike current paper which looks frankly like a draft from a scrapbook, is painfully lacking on details and is almost void of any scientific rigour.
pilgrimdan
Posts: 405
Joined: Sat Jul 02, 2011 10:49 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by pilgrimdan »

Jesse Gersenson wrote:
syzygy wrote:
syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.
Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
from the above pdf ...

"Instead of analpha-beta search with domain-specific enhancements, AlphaZero uses agenera lpurpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulatedgamesofself-playthattraverseatreefromroot sroot toleaf. Eachsimulationproceedsby selecting in each state s a move a with low visit count, high move probability and high value (averaged over the leaf states of simulations that selected a from s) according to the current neural network fθ. The search returns a vector π representing a probability distribution over moves, either proportionally or greedily with respect to the visit counts at the root state."
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

Dariusz Orzechowski wrote:I see Milos answered already. If I may suggest something, read mentioned Nature paper, i.e. "Mastering the Game of Go without Human Knowledge", it's really good. Unlike current paper which looks frankly like a draft from a scrapbook, is painfully lacking on details and is almost void of any scientific rigour.
Yes, the Nature paper helped me to understand that I was misreading a few things.

It would be very interesting to know to roughly what Stockfish search depth AlphaZero's evaluation corresponds. If the real trick (for chess) is the MCTS and the parallelism it allows, then that can be replicated with Stockfish.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Milos »

syzygy wrote:
Dariusz Orzechowski wrote:I see Milos answered already. If I may suggest something, read mentioned Nature paper, i.e. "Mastering the Game of Go without Human Knowledge", it's really good. Unlike current paper which looks frankly like a draft from a scrapbook, is painfully lacking on details and is almost void of any scientific rigour.
Yes, the Nature paper helped me to understand that I was misreading a few things.

It would be very interesting to know to roughly what Stockfish search depth AlphaZero's evaluation corresponds. If the real trick (for chess) is the MCTS and the parallelism it allows, then that can be replicated with Stockfish.
I believe sweet number is around 8. Depth 8 mpv search with 2-3 root moves takes around 10ms on a modern Intel core.
So an easy test would be single core match of 800s/move MC SF vs. 1s/move standard SF.
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Jesse Gersenson »

Dariusz Orzechowski wrote:"Mastering the Game of Go without Human Knowledge"
https://deepmind.com/documents/119/agz_ ... nature.pdf
I believe sweet number is around 8. Depth 8 mpv search with 2-3 root moves takes around 10ms on a modern Intel core.
So an easy test would be single core match of 800s/move MC SF vs. 1s/move standard SF.
Hi Milos,
Could you explain this a little. I don't understand the assumption which concludes 800s vs 1s is reasonable.

Three root moves, each reaching depth 8, takes 10ms. That I understand. But the relationship between MC and SF's regular search I don't understand (perhaps because I don't know how SF search works).
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Jesse Gersenson »

syzygy wrote:
Jesse Gersenson wrote:
Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks
That seems to describe the data set they generate and then work from. I don't know what "mini-batches of size 4,096" means, but 4096 x 700000 = 2,867,200,000.
I think each mini-batch corresponds to 4096 moves.
Mini-batches are 4096 positions. See p. 10 of the Nature pdf.
Over the course of training, 29 million games of self-play were generated. Parameters were updated from 3.1 million mini-batches of 2,048 positions each.