Milos wrote:1 Google TPU is around 50W, basically you have them 4 and another Haswell to run actual MCTS on those so around 300W.
SF's hardware was most probably two 32 core CPUs each at 150W, so around 300W also.
However, even they were on the same wattage, problem is Alpha0 was running on specialized hardware, while SF was running on general-purpose hardware. That on itself is totally unfair point.
To make it fair, one could run SF on smaller Haswell for the search (the same one they used for Alpha0), and using 10 Xilinx UltraScale+ FPGA chips for running evaluation, each chips consuming 20W, and running move generator and 100 evaluation terms in parallel on like 300MHz in lets say 10clock cycles (including movegen). This would be DeepBlue effort but on today's cutting-edge hardware and software. In that way SF system would still consume 300W, but have 30Bnps on 10 cores performance. I can immediately tell you that it would be at least 400-500Elo stronger than current SF.
And building such a system would cost less than what was spent just on electricity to train Alpha0.
And certainly would require less working hours than what has been invested in Alpha0.
So sorry, but Alpha0 is not the holy grail or best way to make a chess machine. It is just the most hyped one atm.
Talking about the margin in B40 Sicilian difference is only 38.5Elo. 20 wins for Alpha0 vs 9 wins for SF8, outdated, 1GB hash, no TBs, no opening-books, ridiculous TC, each of these points taking away at least 15-20Elo from SF. And you still think it's a big margin???
Have you seen the training diagram (Fig.1 in the paper)???
After first 4 hours of training, for next 8 hours they improved only lousy 30Elo until they totally saturated. They could continue training for months and they would most probably just get it worse not better for an inch.
If they really had a comfortable margin, they wouldn't rely on such a lousy tricks essentially crippling SF just to win. You think these ppl at Google are stupid and don't know what 1GB of hash for 64 core machine means? Or normal TC, or opening book???
Thanks for the power consumption info - tried to find it but had no luck...
Why do you think SF should have played with an opening book and an EGTB? AlphaZero had none. As I gather, it used actual computation to play all its moves, rather than relying on instant lookup.
SF surely lost 20-30 Elo at least due to fixed move time, but given the fact that AlphaZero itself played under the same conditions, there is no reason to believe the outcome would have been any different even with dynamic time management given to both sides. (Actually, if there is any mention of AlphaZero's time management in the paper, I've missed it.)
AlphaZero's hardware is better described as novel than specialized. There is no doubt it will become commonplace, perhaps in a matter of years, while FPGAs will remain exotic, however powerful and energy-efficient they might be.
If I had to name just one reason why AlphaZero playing chess the way it plays is a monumental achievement, it is this: in the chapter title "Anatomy of a Computer Chess Program" the authors list a number of computer chess techniques (alpha-beta search, material imbalance tables, PSQT, mobility, pawn structure eval, king safety, QS, pruning, extensions, history, SEE, heuristics, TT, etc.) and conclude
AlphaZero uses none of them.
Not one. It appears that the only thing SF and AlphaZero have in common is the move generator. That's mind-boggling, and remains mind-boggling even if Google's results do appear to be oversold a bit.