Alpha Zero vs Stockfish 8 tournament conditions.

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Alpha Zero vs Stockfish 8 tournament conditions.

Poll ended at Fri Dec 08, 2017 5:15 am

The time per move and hardware etc was fair.
27
52%
Google set it up to give Alpha Zero an edge.
25
48%
 
Total votes: 52

syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Uri Blass »

clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.

-Carl
Stockfish learn nothing from the games it play in fishtest and we even do not have a pgn of all the games that stockfish play in fishtest.

Stockfish is also not a logical engine from my point of view and they accept illogical changes only because they earn elo.

For example they decided not to stop the search in obvious drawn position of KB vs K or KN vs K only because it is a simplification to continue to search and it does not lose elo because stockfish use maybe 0.5% of the time on searching these positions after the change and telling it not to search these positions is making it 0.6% faster(0.5% and 0.6% are wrong and only demonstrate the idea) .

It does not make sense from my point of view and if stockfish become slower only because of knowledge that every human has in his search then
Maybe the reason is that stockfish code is too small and if one add a lot of if then the same if is going to reduce the speed of stockfish by a smaller factor so knowledge that KB vs K or KBN vs K is a draw is going to be productive but the stockfish team will never add it because they do not care about microscopic elo improvement that is a mistake from my point of view because a lot of 0.1 elo improvements may lead to a measurable elo improvement in the future and not every thing that you add need testing in games.

I wonder if alphazero is smarter than stockfish and know not to search obvious drawn positions.
Uri Blass
Posts: 10267
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Uri Blass »

syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
I think that it may be interesting to see the first 10 games that they play and explanation exactly what they learned from them.

Maybe I can learn from it how to learn to play better.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Jesse Gersenson »

syzygy wrote:
syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.
Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
Dariusz Orzechowski
Posts: 44
Joined: Thu May 02, 2013 5:23 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Dariusz Orzechowski »

syzygy wrote:
syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.
Training data is actually huge and contains 3 things for every move: position, game result and vector of search probabilities for every candidate move in that position. They don't just train on played move. This is described in the previous paper "Mastering the Game of Go without Human Knowledge".
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

Jesse Gersenson wrote:Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
Yes, I looked at it again. Apparently they're training their network to predict the result of their MCTS searches on every move. So the game outcome is hardly relevant (although still specifically mentioned, somewhat curiously). Far more important must be the values of the terminal nodes in which the MCTS simulations end.

I guess the only reason for playing full games during training is to get a reasonable variation of training positions.
Dariusz Orzechowski
Posts: 44
Joined: Thu May 02, 2013 5:23 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Dariusz Orzechowski »

syzygy wrote:
Jesse Gersenson wrote:Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
Yes, I looked at it again. Apparently they're training their network to predict the result of their MCTS searches on every move. So the game outcome is hardly relevant (although still specifically mentioned, somewhat curiously). Far more important must be the values of the terminal nodes in which the MCTS simulations end.

I guess the only reason for playing full games during training is to get a reasonable variation of training positions.
Game result is relevant because they train a network with dual output: move prediction (policy head) and game outcome (value head).
Jesse Gersenson
Posts: 593
Joined: Sat Aug 20, 2011 9:43 am

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by Jesse Gersenson »

Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks
That seems to describe the data set they generate and then work from. I don't know what "mini-batches of size 4,096" means, but 4096 x 700000 = 2,867,200,000.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Post by syzygy »

Dariusz Orzechowski wrote:
syzygy wrote:
Jesse Gersenson wrote:Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
Yes, I looked at it again. Apparently they're training their network to predict the result of their MCTS searches on every move. So the game outcome is hardly relevant (although still specifically mentioned, somewhat curiously). Far more important must be the values of the terminal nodes in which the MCTS simulations end.

I guess the only reason for playing full games during training is to get a reasonable variation of training positions.
Game result is relevant because they train a network with dual output: move prediction (policy head) and game outcome (value head).
I see now. So at position t they use the network to generate move "probabilities" p_t and an expected outcome v_t. They then use MCTS to generate a more accurate move-probability vector pi_t. At the end of the game they use these more accurate move-probability vectors and the game outcome z to train the network by adjusting the weights such that the vectors and predictions will be more accurate.

I wonder why they don't use MCTS also to calculate a more accurate expected game outcome and use that for adjusting the weights instead of z. After all, the final game outcome depends on the choices made later in the game and may not be the best reflection of the winning chances at position t.

Btw, what do they mean exactly by "move probability"? The probability that a move is played is a circular definition. I suppose the more "probable" moves are supposed to be the better moves?