AlphaGo and Stockfish played on similar hardware

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: AlphaGo and Stockfish played on similar hardware

Post by Lyudmil Tsvetkov »

vvarkey wrote:Ignore all the training that went into AlphaZero for a second.

Per the paper, for the 100 games:
AlphaZero and the previous AlphaGo Zero used a single machine with 4 TPUs. Stockfish and Elmo played at their strongest skill level using 64 threads and a hash size of 1GB.
According to https://cloud.google.com/blog/big-data/ ... g-unit-tpu:
We announced the TPU last year and recently followed up with a detailed study of its performance and architecture. In short, we found that the TPU delivered 15–30X higher performance and 30–80X higher performance-per-watt than contemporary CPUs and GPUs.
So, a single machine with 4 TPUs (15x4 = 60) is somewhat comparable to 64 CPU threads.

Now, for training AlphaGo, DeepMind really did use tons of hardware: 5,000 Gen 1 TPUs to generate the games for training and 64 Gen 2 TPUs for training the neural nets.

But for comparing playing strengths, these numbers are as relevant as counting how many man-hours went into the development of Stockfish.
Thah has already been discussed, just carefully read the threads.
User avatar
hgm
Posts: 28354
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: AlphaGo and Stockfish played on similar hardware

Post by hgm »

Milos wrote:This is so wrong on so many levels.
Do you understand that TPU has basically only 5 instructions, load weight, load inputs, multiply, activate and write results to CPU.
Of course, I have read the paper describing it. But whe the CPU says 'multiply', it can order 64K multiplications in one stroke. That certainly helps.
Every higher level DCNN operation has to be handled by the CPU, all data manipulation for inference through layers. And this is around 1GB of weights manipulations 80000 times per second!!!
The TPU has an internal weight memory, and can be ordered to load weights from there in the shift buffer that will finally be coppied to the multiplier array. Again not individually, but in bulk. The TPU also has a DMA controller to directly load data in bulk from CPU memory, without CPU involvement.
Plus on top of that traversing the whole tree, since you start from the root, and at each node performing action calculation for each move and once you reach leaf node, expand, prepare all the data for TPU and once priors are calculated backpropagate action value to the root.
And you think that could be done of 80486 CPU????
Well, so perhaps I am guilty of some poetic exaggeration. But a low-performance CPU, as I finally mention, seems good enough.
It really makes no sense discussing these stuff with you, when you don't have even the basic knowledge.
And yet you ever tire of doing it.
Again they didn't even bother to write that they used heavy CPU for that in the paper, most probably the same CPU they used to run SF on. So they had 4 TPUs extra. Just a tiny bit of difference.
The way they wrote that paper is shameless. If that is really the version they submitted for peer-review that would be just outrages if it got accepted.
This is pure guessing on your part. You cannot possibly kow that. You have not seen their code. It is all driven by paranoia and conspiracy theories.
Whether the hardware can be bought on the free market or not should not be relevant for the question of whether it is similar. What is not for sale now could be for sale tomorrow, and then what was 20 times as powerful would suddely be only half as powerful when it is offered cheaply? I don't think so...
This pretty irrelevant and pointless.
What you write is just wishful thinking. Tomorrow we could have 256 cores x86 CPU for 1000$ so what?
What is relevant is that if you can't buy it, you can only make estimation based on what you can buy. And in that respect 1TPU is like 2GV100 that cost 3k$ per piece. So 4TPUs=8x3k =24k$. Or if you like 1TPU=10x1080Ti. 1080Ti is 600$. 4TPUs=40x600=24k$, again the same math.
For 24k$ you can buy 4 top of the range FPGA boards + 2x 32core x86 CPUs easily.[/quote]