Google's AlphaGo team has been working on chess

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Google's AlphaGo team has been working on chess

Post by Milos »

Joost Buijs wrote:
CheckersGuy wrote:
Joost Buijs wrote:I don't think it will take that long. Google is talking about 8 bit teraops which is not the same as teraflops. According to Google their 1st gen TPU runs approximately 35 times faster as the Xeon E5-2699 v3. (using all cores?)
5000 * 9 * 35 == 1575000 hours. 1575000/24/365 == 180 years.
Still a considerable amount of time though.
180 years sounds like a lot but it actually isnt. Those are only the first generations of specialized ai chips which are pretty much still in their infancy. This coupled with die shrinks in the future we will see a 10.000 fold increase like we have seen with cpus and gpus. Then 180 years on the current state of the art doesn't sound like much anymore :lol:
10.000 fold seems very optimistic, you can't go on with die shrinks forever.

Next year nVidia will release more affordable versions of their Volta GPU which runs at approximately the same speed as a gen 1 TPU, stacking 4 of these in a workstation seems feasible, this is probably the closest you can get in the nearby future.
Not really with new GV100 you have realistic (non-boost) 12 TFLOPS of FP32 and in best case double that in FP16.
For training most of the hardware resources you need for self-play games. For that 5000 first gen TPUs were used. So mostly inference i.e. int8 multiplication.
First gen TPU has performance of 92T int8 OPs. So you'd need at least 4 GV100 to get the performance of 1 TPU.
However, GV100 has also tensor unit with boost performance of 110TFLOPS.
However, problem with tensor cores is that in inference convolution 3x3 kernels were used not 4x4, meaning each tensor unit would only work with partial bandwidth, i.e. around 40% of full performance.
So max combined performance of GV100 would be around half of TPU performance. Therfore 5000TPU ~ 10000GV100 i.e. 10000 fold time to play all 44 million games on a single GV100 than what Google used, i.e. around 10 years.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Google's AlphaGo team has been working on chess

Post by hgm »

syzygy wrote:They are not playing out any games "to the very end". And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").
Where do you read that? When I was glancing the paper, I got the impression that what they do is normal MCTS, with the caveat that 'random' in the move selection for the playouts does not mean 'homogeneously distributed probabilities', but probabilities according to the policy of their NN. In absence of scoring at the game ends (e.g. declaring every game a draw) that would make the MCTS a self-fulfilling prophecy, generating a tree that has exactly the same statistics as the NN prediction. But the actual game results will steer the MCTS away from poor moves, and will make it focus on good moves. And the NN will then be tuned to already produce such focusing on good moves for the initial visiting probabilities, etc.

So you get progressively more realistic 'random' playouts, which will define more and more narrow MCTS trees in every position of the test games.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Google's AlphaGo team has been working on chess

Post by Milos »

hgm wrote:
syzygy wrote:They are not playing out any games "to the very end". And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").
Where do you read that? When I was glancing the paper, I got the impression that what they do is normal MCTS, with the caveat that 'random' in the move selection for the playouts does not mean 'homogeneously distributed probabilities', but probabilities according to the policy of their NN. In absence of scoring at the game ends (e.g. declaring every game a draw) that would make the MCTS a self-fulfilling prophecy, generating a tree that has exactly the same statistics as the NN prediction. But the actual game results will steer the MCTS away from poor moves, and will make it focus on good moves. And the NN will then be tuned to already produce such focusing on good moves for the initial visiting probabilities, etc.

So you get progressively more realistic 'random' playouts, which will define more and more narrow MCTS trees in every position of the test games.
There are no playouts. Did you even read AGZ Nature paper???
Compared to the MCTS in AlphaGo Fan and AlphaGo Lee, the principal differences are that AlphaGo Zero does not use any rollouts; it uses a single neural network instead of separate policy and value networks; leaf nodes are always expanded, rather than using dynamic expansion; each search thread simply waits for the neural network evaluation, rather than performing evaluation and backup asynchronously; and there is no tree policy.
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: Google's AlphaGo team has been working on chess

Post by CheckersGuy »

hgm wrote:
syzygy wrote:They are not playing out any games "to the very end". And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").
Where do you read that? When I was glancing the paper, I got the impression that what they do is normal MCTS, with the caveat that 'random' in the move selection for the playouts does not mean 'homogeneously distributed probabilities', but probabilities according to the policy of their NN. In absence of scoring at the game ends (e.g. declaring every game a draw) that would make the MCTS a self-fulfilling prophecy, generating a tree that has exactly the same statistics as the NN prediction. But the actual game results will steer the MCTS away from poor moves, and will make it focus on good moves. And the NN will then be tuned to already produce such focusing on good moves for the initial visiting probabilities, etc.

So you get progressively more realistic 'random' playouts, which will define more and more narrow MCTS trees in every position of the test games.
They dont do random playouts which means if you are at a leafNode you give the position to the NN to get a winProbability instead of doing a random playout to the very end of the game
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Google's AlphaGo team has been working on chess

Post by hgm »

So what do they mean by this?
At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Albert Silver
Posts: 3019
Joined: Wed Mar 08, 2006 9:57 pm
Location: Rio de Janeiro, Brazil

Re: Google's AlphaGo team has been working on chess

Post by Albert Silver »

Milos wrote:
hgm wrote:
syzygy wrote:They are not playing out any games "to the very end". And the randomness of the selection is also quite limited (if not completely absent - the paper states that the edge is selected that maximizes an "upper confidence bound").
Where do you read that? When I was glancing the paper, I got the impression that what they do is normal MCTS, with the caveat that 'random' in the move selection for the playouts does not mean 'homogeneously distributed probabilities', but probabilities according to the policy of their NN. In absence of scoring at the game ends (e.g. declaring every game a draw) that would make the MCTS a self-fulfilling prophecy, generating a tree that has exactly the same statistics as the NN prediction. But the actual game results will steer the MCTS away from poor moves, and will make it focus on good moves. And the NN will then be tuned to already produce such focusing on good moves for the initial visiting probabilities, etc.

So you get progressively more realistic 'random' playouts, which will define more and more narrow MCTS trees in every position of the test games.
There are no playouts. Did you even read AGZ Nature paper???
Compared to the MCTS in AlphaGo Fan and AlphaGo Lee, the principal differences are that AlphaGo Zero does not use any rollouts; it uses a single neural network instead of separate policy and value networks; leaf nodes are always expanded, rather than using dynamic expansion; each search thread simply waits for the neural network evaluation, rather than performing evaluation and backup asynchronously; and there is no tree policy.
Here is a link to the Nature article (given at DeepMind site):

https://goo.gl/4SbJh1
"Tactics are the bricks and sticks that make up a game, but positional play is the architectural blueprint."
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Google's AlphaGo team has been working on chess

Post by Michel »

hgm wrote:So what do they mean by this?
At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: Google's AlphaGo team has been working on chess

Post by CheckersGuy »

Michel wrote:
hgm wrote:So what do they mean by this?
At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?
This. Maybe HG forgot that this paper isn't only about chess but any board game. Therefore terminal positions are scored according to the rules of the game. This quote was about the self-played games which obviously need to be scored !!!
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Google's AlphaGo team has been working on chess

Post by Sven »

CheckersGuy wrote:
Michel wrote:
hgm wrote:So what do they mean by this?
At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?
This. Maybe HG forgot that this paper isn't only about chess but any board game. Therefore terminal positions are scored according to the rules of the game. This quote was about the self-played games which obviously need to be scored !!!
Strictly spoken, the Nature article is only about Go. But apart from that nitpicking you are right. The phrase quoted by HG is clearly related to terminal positions of the self-play games, not to leaf nodes of MCTS search. It is crucial to understand the difference: training was done by playing a huge number of self-play games ("iterations"), and at each position of those games MCTS was used to calculate move probabilites which in turn were used to improve the NN. According to the section "Methods" an MCTS leaf node is reached after L "time-steps" which sounds like some fixed search depth. AlphaGo Zero (and thus also AlphaZero for chess) does not use MonteCarlo "playouts" or "rollouts". So even though they call their method MCTS-based, it is not like a standard MCTS due to the complete lack of random playouts.
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: Google's AlphaGo team has been working on chess

Post by CheckersGuy »

Sven wrote:
CheckersGuy wrote:
Michel wrote:
hgm wrote:So what do they mean by this?
At the end of the game, the terminal position sT is scored according to the rules of the game to compute the game outcome z: −1 for a loss, 0 for
a draw, and +1 for a win.
Maybe just that they obviously don't use their NN to score positions where the game has ended?
This. Maybe HG forgot that this paper isn't only about chess but any board game. Therefore terminal positions are scored according to the rules of the game. This quote was about the self-played games which obviously need to be scored !!!
Strictly spoken, the Nature article is only about Go. But apart from that nitpicking you are right. The phrase quoted by HG is clearly related to terminal positions of the self-play games, not to leaf nodes of MCTS search. It is crucial to understand the difference: training was done by playing a huge number of self-play games ("iterations"), and at each position of those games MCTS was used to calculate move probabilites which in turn were used to improve the NN. According to the section "Methods" an MCTS leaf node is reached after L "time-steps" which sounds like some fixed search depth. AlphaGo Zero (and thus also AlphaZero for chess) does not use MonteCarlo "playouts" or "rollouts". So even though they call their method MCTS-based, it is not like a standard MCTS due to the complete lack of random playouts.
Yes.
I meant the alphaZero paper and not the AlphaGoZero-nature paper. I guess we will find out more about how A0 zero works once they publish the entire paper