Alpha Zero vs Stockfish 8 tournament conditions.

syzygy · Post by **syzygy** » Sat Dec 09, 2017 7:25 pm

clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.

That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.

Uri Blass · Post by **Uri Blass** » Sat Dec 09, 2017 7:31 pm

clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.

-Carl

Stockfish learn nothing from the games it play in fishtest and we even do not have a pgn of all the games that stockfish play in fishtest.

Stockfish is also not a logical engine from my point of view and they accept illogical changes only because they earn elo.

For example they decided not to stop the search in obvious drawn position of KB vs K or KN vs K only because it is a simplification to continue to search and it does not lose elo because stockfish use maybe 0.5% of the time on searching these positions after the change and telling it not to search these positions is making it 0.6% faster(0.5% and 0.6% are wrong and only demonstrate the idea) .

It does not make sense from my point of view and if stockfish become slower only because of knowledge that every human has in his search then
Maybe the reason is that stockfish code is too small and if one add a lot of if then the same if is going to reduce the speed of stockfish by a smaller factor so knowledge that KB vs K or KBN vs K is a draw is going to be productive but the stockfish team will never add it because they do not care about microscopic elo improvement that is a mistake from my point of view because a lot of 0.1 elo improvements may lead to a measurable elo improvement in the future and not every thing that you add need testing in games.

I wonder if alphazero is smarter than stockfish and know not to search obvious drawn positions.

Uri Blass · Post by **Uri Blass** » Sat Dec 09, 2017 7:35 pm

syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.

I think that it may be interesting to see the first 10 games that they play and explanation exactly what they learned from them.

Maybe I can learn from it how to learn to play better.

syzygy · Post by **syzygy** » Sat Dec 09, 2017 7:36 pm

syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.

This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.

Jesse Gersenson · Post by **Jesse Gersenson** » Sat Dec 09, 2017 9:24 pm

syzygy wrote:
syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.

Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf

Dariusz Orzechowski · Post by **Dariusz Orzechowski** » Sat Dec 09, 2017 9:35 pm

syzygy wrote:
syzygy wrote:
clumma wrote:
Ras wrote:You are completely mistaken if you think that Google threw half a computing centre at the match. In the self-training, yes, that was a different story, but so was Stockfish's development (e.g. fishtest).
In particular, they played 44 million games in 9 hours. Fishtest does that in about 3 weeks.
That means that fewer than 20 million games (4 hours of training) were sufficient to get a superb evaluation function.

20 million is an amazingly small number if you realise that they use only the result (win/draw/loss) of each game to adjust the weights. That is less than 4MB of information.
This is probably not entirely correct. They must somehow take into account the moves played when feeding back the game result into the neural net. But still it is difficult to understand how enough information can be extracted from 20 million games.

Training data is actually huge and contains 3 things for every move: position, game result and vector of search probabilities for every candidate move in that position. They don't just train on played move. This is described in the previous paper "Mastering the Game of Go without Human Knowledge".

syzygy · Post by **syzygy** » Sat Dec 09, 2017 10:05 pm

Jesse Gersenson wrote:Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf

Yes, I looked at it again. Apparently they're training their network to predict the result of their MCTS searches on every move. So the game outcome is hardly relevant (although still specifically mentioned, somewhat curiously). Far more important must be the values of the terminal nodes in which the MCTS simulations end.

I guess the only reason for playing full games during training is to get a reasonable variation of training positions.

Dariusz Orzechowski · Sat Dec 09, 2017 10:23 pm

syzygy wrote:
Jesse Gersenson wrote:Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
Yes, I looked at it again. Apparently they're training their network to predict the result of their MCTS searches on every move. So the game outcome is hardly relevant (although still specifically mentioned, somewhat curiously). Far more important must be the values of the terminal nodes in which the MCTS simulations end.

I guess the only reason for playing full games during training is to get a reasonable variation of training positions.

Game result is relevant because they train a network with dual output: move prediction (policy head) and game outcome (value head).

Jesse Gersenson · Post by **Jesse Gersenson** » Sat Dec 09, 2017 10:25 pm

Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks

That seems to describe the data set they generate and then work from. I don't know what "mini-batches of size 4,096" means, but 4096 x 700000 = 2,867,200,000.

syzygy · Post by **syzygy** » Sat Dec 09, 2017 11:52 pm

Dariusz Orzechowski wrote:
syzygy wrote:
Jesse Gersenson wrote:Read their pdf, the good stuff starts at the end of page 2:
https://arxiv.org/pdf/1712.01815.pdf
Yes, I looked at it again. Apparently they're training their network to predict the result of their MCTS searches on every move. So the game outcome is hardly relevant (although still specifically mentioned, somewhat curiously). Far more important must be the values of the terminal nodes in which the MCTS simulations end.

I guess the only reason for playing full games during training is to get a reasonable variation of training positions.
Game result is relevant because they train a network with dual output: move prediction (policy head) and game outcome (value head).

I see now. So at position t they use the network to generate move "probabilities" p_t and an expected outcome v_t. They then use MCTS to generate a more accurate move-probability vector pi_t. At the end of the game they use these more accurate move-probability vectors and the game outcome z to train the network by adjusting the weights such that the vectors and predictions will be more accurate.

I wonder why they don't use MCTS also to calculate a more accurate expected game outcome and use that for adjusting the weights instead of z. After all, the final game outcome depends on the choices made later in the game and may not be the best reflection of the winning chances at position t.

Btw, what do they mean exactly by "move probability"? The probability that a move is played is a circular definition. I suppose the more "probable" moves are supposed to be the better moves?

Alpha Zero vs Stockfish 8 tournament conditions.

Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.

Re: Alpha Zero vs Stockfish 8 tournament conditions.