AlphaGo and Stockfish played on similar hardware

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
vvarkey
Posts: 88
Joined: Fri Mar 10, 2006 11:20 am
Location: Bangalore India

AlphaGo and Stockfish played on similar hardware

Post by vvarkey »

Ignore all the training that went into AlphaZero for a second.

Per the paper, for the 100 games:
AlphaZero and the previous AlphaGo Zero used a single machine with 4 TPUs. Stockfish and Elmo played at their strongest skill level using 64 threads and a hash size of 1GB.
According to https://cloud.google.com/blog/big-data/ ... g-unit-tpu:
We announced the TPU last year and recently followed up with a detailed study of its performance and architecture. In short, we found that the TPU delivered 15–30X higher performance and 30–80X higher performance-per-watt than contemporary CPUs and GPUs.
So, a single machine with 4 TPUs (15x4 = 60) is somewhat comparable to 64 CPU threads.

Now, for training AlphaGo, DeepMind really did use tons of hardware: 5,000 Gen 1 TPUs to generate the games for training and 64 Gen 2 TPUs for training the neural nets.

But for comparing playing strengths, these numbers are as relevant as counting how many man-hours went into the development of Stockfish.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: AlphaGo and Stockfish played on similar hardware

Post by hgm »

I am not sure that with 'contemporary CPUs' they mean 'single cores'. It is also unclear whether Stockfish was using 64 cores, or just 64 hyper threads o 32 cores.

However, your basic claim seems to be correct. TPUs and (multi-core) CPUs are comparable hardware. They just do very different things, and what one is good at, the other can do only very poorly, or not at all. It is probably quite easy to find tasks that TPUs would do much slower than a CPU. Of course you would see those seldomly mentioned in promotional material for TPUs.

One could argue that the TPUs are specifically adapted to run neural etworks, and that Stockfish had to ru on hardware not specially designed to run Stockfish, but a general CPU equally suitable for many tasks. OTOH, the TPUs are not specifically designed for running the AlphaZero network; 'neural network' is still a pretty general application as well.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: AlphaGo and Stockfish played on similar hardware

Post by syzygy »

AlphaZero probably ran on the same PC as Stockfish: a PC with 32 or 64 general-purpose cores and 4 TPUs.

So the hardware was identical. Stockfish just chose not to make use of the TPUs. Or at least, that is one way of looking at it.

If the TPUs are indeed first-generation TPUs, they apparently consume 28-40 Watt each. 160 Watt is less than what the 32/64 cores will use. And these first generation TPUs are manufactured using 28nm technology from 2010/2011.
User avatar
vvarkey
Posts: 88
Joined: Fri Mar 10, 2006 11:20 am
Location: Bangalore India

Re: AlphaGo and Stockfish played on similar hardware

Post by vvarkey »

hgm wrote:I am not sure that with 'contemporary CPUs' they mean 'single cores'.
oops. in the actual TPU paper https://arxiv.org/pdf/1704.04760.pdf:
The traditional CPU server is represented by an 18-core, dual-socket Haswell processor from Intel. The GPU accelerator is the Nvidia K80.
So 4 TPUs = 18x15x4 = at least 1080 cores (threads)

I think Stockfish was using 64 actual cores since Google's Compute Engine offers 64 core CPUs.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: AlphaGo and Stockfish played on similar hardware

Post by Milos »

syzygy wrote:AlphaZero probably ran on the same PC as Stockfish: a PC with 32 or 64 general-purpose cores and 4 TPUs.

So the hardware was identical. Stockfish just chose not to make use of the TPUs. Or at least, that is one way of looking at it.

If the TPUs are indeed first-generation TPUs, they apparently consume 28-40 Watt each. 160 Watt is less than what the 32/64 cores will use. And these first generation TPUs are manufactured using 28nm technology from 2010/2011.
TSMC 28nm process (first 28nm process ever) from late 2011. But actual TPUs were fabricated in 2015.
P.S. Stockfish didn't have any choice nor SF authors for that matter. DeepMind used it in the way they liked. We even don't know which compile was used, official, BMI capable of they compiled it themselves.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo and Stockfish played on similar hardware

Post by Michael Sherwin »

vvarkey wrote:Ignore all the training that went into AlphaZero for a second.
You really cannot do that, not even for a millisecond. The training (did I read 44 million games?) is worth more than 1,000 elo and probably much more. The learning was guided by NN to learn on the most promising lines thus narrowing the field. A0 could have gotten winning positions against SF without ever leaving its learn file. The rest of the positions were so good that the chess playing algorithm of A0 could then get a win or at least draw. Believe me I know as I've seen RomiChess play entire games from its learn file. Even if the learn file does not produce a move to play immediately the fact that the whole subtree of the current position with its learned values are loaded into the hash causes the search to return much stronger moves on average. You can't ignore the training, it is 90% of the strength of A0.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: AlphaGo and Stockfish played on similar hardware

Post by Dirt »

Michael Sherwin wrote:You really cannot do that, not even for a millisecond. The training (did I read 44 million games?) is worth more than 1,000 elo and probably much more. The learning was guided by NN to learn on the most promising lines thus narrowing the field. A0 could have gotten winning positions against SF without ever leaving its learn file. The rest of the positions were so good that the chess playing algorithm of A0 could then get a win or at least draw. Believe me I know as I've seen RomiChess play entire games from its learn file. Even if the learn file does not produce a move to play immediately the fact that the whole subtree of the current position with its learned values are loaded into the hash causes the search to return much stronger moves on average. You can't ignore the training, it is 90% of the strength of A0.
You could train AlphaZero on chess and then make it play Fischer Random. To eliminate the development time you could even limit it to those FRC positions (5?) where castling doesn't change.

I'm not sure what that would tell us but I'd find it interesting.
Deasil is the right way to go.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo and Stockfish played on similar hardware

Post by Michael Sherwin »

Dirt wrote:
Michael Sherwin wrote:You really cannot do that, not even for a millisecond. The training (did I read 44 million games?) is worth more than 1,000 elo and probably much more. The learning was guided by NN to learn on the most promising lines thus narrowing the field. A0 could have gotten winning positions against SF without ever leaving its learn file. The rest of the positions were so good that the chess playing algorithm of A0 could then get a win or at least draw. Believe me I know as I've seen RomiChess play entire games from its learn file. Even if the learn file does not produce a move to play immediately the fact that the whole subtree of the current position with its learned values are loaded into the hash causes the search to return much stronger moves on average. You can't ignore the training, it is 90% of the strength of A0.
You could train AlphaZero on chess and then make it play Fischer Random. To eliminate the development time you could even limit it to those FRC positions (5?) where castling doesn't change.

I'm not sure what that would tell us but I'd find it interesting.
If I understand A0's learning approach and I think that I do then all the pretraining at classic chess would be useless against Fischer Random unless it transposes somehow and the A0 learned tree can handle transpositions. However, A0 could train 44 million games on all FR positions with the same effect.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
mjlef
Posts: 1494
Joined: Thu Mar 30, 2006 2:08 pm

Re: AlphaGo and Stockfish played on similar hardware

Post by mjlef »

I do not think so. Looking at the TensorFLow Processing unit specs from there (second gen):

https://en.wikipedia.org/wiki/Tensor_processing_unit

it says " 45 TFLOPS"

Intel in typical literature for a 72 core machnes:

https://www.intel.com/content/www/us/en ... ssors.html

says "With up to 72 out-of-order cores, the new Intel® Xeon Phi™ processor delivers over 3 teraFLOPS "

It is a bit unclear how many of the chips they cite goes into one second generation TPU.

the quotes about power per TFLOP does not really tell us, but the above confirms TPUs are much faster at neural nets than a standard Intel chip.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: AlphaGo and Stockfish played on similar hardware

Post by Dirt »

Michael Sherwin wrote:If I understand A0's learning approach and I think that I do then all the pretraining at classic chess would be useless against Fischer Random unless it transposes somehow and the A0 learned tree can handle transpositions. However, A0 could train 44 million games on all FR positions with the same effect.
How well AlphaZero handles FRC positions without specific training for them is the question I was getting at. We disagree on how well it would do, and without a way to do the actual test I see no way to know for sure which of us is correct.
Deasil is the right way to go.