Branko Radovanovic wrote:Price would, of course, be a good criterion for comparison, but from an engineering standpoint it would be interesting to compare power consumption of these two systems. If Google's TPU system does not draw (much) more power than a 64-core PC, then it might be argued that it is not (much) more powerful, both in a literal and in a metaphorical sense. Does anyone know the figures?
1 Google TPU is around 50W, basically you have them 4 and another Haswell to run actual MCTS on those so around 300W.
SF's hardware was most probably two 32 core CPUs each at 150W, so around 300W also.
However, even they were on the same wattage, problem is Alpha0 was running on specialized hardware, while SF was running on general-purpose hardware. That on itself is totally unfair point.
To make it fair, one could run SF on smaller Haswell for the search (the same one they used for Alpha0), and using 10 Xilinx UltraScale+ FPGA chips for running evaluation, each chips consuming 20W, and running move generator and 100 evaluation terms in parallel on like 300MHz in lets say 10clock cycles (including movegen). This would be DeepBlue effort but on today's cutting-edge hardware and software. In that way SF system would still consume 300W, but have 30Bnps on 10 cores performance. I can immediately tell you that it would be at least 400-500Elo stronger than current SF.
And building such a system would cost less than what was spent just on electricity to train Alpha0.
And certainly would require less working hours than what has been invested in Alpha0.
So sorry, but Alpha0 is not the holy grail or best way to make a chess machine. It is just the most hyped one atm.
It's not just that AlphaZero is stronger - it's by what margin, and I'm not sure this could be explained simply by having more powerful hardware.
Talking about the margin in B40 Sicilian difference is only 38.5Elo. 20 wins for Alpha0 vs 9 wins for SF8, outdated, 1GB hash, no TBs, no opening-books, ridiculous TC, each of these points taking away at least 15-20Elo from SF. And you still think it's a big margin???
Have you seen the training diagram (Fig.1 in the paper)???
After first 4 hours of training, for next 8 hours they improved only lousy 30Elo until they totally saturated. They could continue training for months and they would most probably just get it worse not better for an inch.
If they really had a comfortable margin, they wouldn't rely on such a lousy tricks essentially crippling SF just to win. You think these ppl at Google are stupid and don't know what 1GB of hash for 64 core machine means? Or normal TC, or opening book???