AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Uri Blass · Post by **Uri Blass** » Thu Dec 07, 2017 2:30 am

Ras wrote:
Leo wrote:It is still unsatisfying to use that old of a SF.
It's really not Google's fault that the Stockfish team hasn't had enough confidence to release Stockfish 9 for more than a full year.

Stockfish release a new version every few days that everybody can download and I see no reason to care about the name of the new version.

Tests show maybe 40 elo advantage for latest version and certainly not 100 elo at long time control so alphago could beat also the new version of stockfish.

MikeB · Post by **MikeB** » Thu Dec 07, 2017 3:28 am

Uri Blass wrote:
Ras wrote:
Leo wrote:It is still unsatisfying to use that old of a SF.
It's really not Google's fault that the Stockfish team hasn't had enough confidence to release Stockfish 9 for more than a full year.
Stockfish release a new version every few days that everybody can download and I see no reason to care about the name of the new version.

Tests show maybe 40 elo advantage for latest version and certainly not 100 elo at long time control so alphago could beat also the new version of stockfish.

That's true but I believe Google purposely choose a weaker version for impact. This is not about chess, this about selling their AI packages to Cities and Governments, the more dominating the result , the bigger the headline - free advertising. Even the result is not as dominating as it sounds - as the relatively high draw rate keeps the ELO difference under 80 ELO, I added one win for SF otherwise Bayeselo does not compute the ELO difference correctly. Current SF would have lost but probably would have been within 30 or 40 ELO

Code: Select all

ResultSet-EloRating>x
ResultSet>reset
ResultSet>rp /Users/michaelbyrne/cluster.mfb/12052200.txt 
101 game&#40;s&#41; loaded
ResultSet>elo
ResultSet-EloRating>mm 
00&#58;00&#58;00,00
ResultSet-EloRating>confidence 0.95
0.9
ResultSet-EloRating>r
Rank Name                 Rating   &#916;     +    -     #     &#931;    &#931;%     W    L    D   W%    =%   OppR 
---------------------------------------------------------------------------------------------------------
   1 Stronger Engine       3132   0.0   50   50   101   64.0  63.4   28    1   72  27.7  71.3  3068 
   2 Weaker Engine         3068  64.3   50   50   101   37.0  36.6    1   28   72   1.0  71.3  3132 
---------------------------------------------------------------------------------------------------------

jhellis3 · Post by **jhellis3** » Thu Dec 07, 2017 3:38 am

I would say the result is much more dominating that the Elo difference would suggest. If one looks at the games, it becomes quite clear at how efficient it is at exploiting holes in conventional programs evaluate functions, especially toward the late midgame / early endgame.

MikeB · Post by **MikeB** » Thu Dec 07, 2017 3:54 am

jhellis3 wrote:I would say the result is much more dominating that the Elo difference would suggest. If one looks at the games, it becomes quite clear at how efficient it is at exploiting holes in conventional programs evaluate functions, especially toward the late midgame / early endgame.

I wrote that before going through the games and I only have gone through two of them, but as you suggest it does appear more dominating than the ELO would suggest.

MikeB · Post by **MikeB** » Thu Dec 07, 2017 4:00 am

clumma wrote:A truly stunning result. Matthew Lai is a coauthor!

https://arxiv.org/pdf/1712.01815.pdf

-Carl

Ok Google - let's share the wealth - make ALphaZero open source 😎

also - congratulations Matthew Lai!! well done 👊 👍

Albert Silver · Post by **Albert Silver** » Thu Dec 07, 2017 4:35 am

Rémi Coulom wrote:
Milos wrote:
Rémi Coulom wrote:1080 ti is 11.3 TFLOPS:
https://www.anandtech.com/show/11172/nv ... t-week-699

A TPU is 45 TOPS:
https://arstechnica.com/information-tec ... ute-cloud/
1st gen TPU is 92 TOPS and an OP is an 8bit int multiplication.
Lets cut this crap of comparing apples and oranges. Please take a look at:
https://arxiv.org/abs/1704.04760

The actual comparison (not apples and oranges stuff you mention) you can see in Table 6 where typical ML application are compared (MLP and CNN).
Factor between first gen TPU and K80 (that is 3-5x faster for ML compared to 1080) is between 15 and 60 averaging around 25x.
The GTX 1080 should be faster than a K80. For instance, this is a deep learning benchmark where it is 4x faster:
https://medium.com/initialized-capital/ ... bd85fe5d58
They have roughly the same number of cores, but the clock speed of the 1080 is 3x the clock speed of the K80. 16nm vs 28 nm technology. The 1080 is definitely faster.

The reason I used 5x in my initial formula is that I believed you meant in your message that a 1080 is 5x slower than a TPU (5x slower than a K80 cannot be correct).

Anyway, whether a TPU is 5x or 10x faster than a 1080 does not change much to the fact that the experiment of DeepMind can be replicated in a few months of distributed computation with ~100 participants, which should be less than the effort that was used by Stockfish so far.

If they added these TPUs to the Cloud, can't you just rent them?

EvgeniyZh · Post by **EvgeniyZh** » Thu Dec 07, 2017 5:01 am

Milos wrote:
clumma wrote:
Milos wrote:4 hours my ass (pardon my french).
Far fewer transistors and joules were used training AlphaZero than have been used training Stockfish. You can soon rent those TPUs on Google's cloud, or apply for free access now, so stop complaining. Furthermore it's an experimental project in early days and performance is obviously not optimal, so all the 'but-but-but 30 Elo because they used SF 8 instead of SF 8.00194' sounds really dumb.

Days of alpha-beta engines have come to an abrupt end.

-Carl
Sorry, that is pretty childish rent.
Google is obviously comparing apples and oranges and again doing marketing stunt and ppl are falling for it.
Days of Alpha0 on normal hardware are years away. But keep on dreaming, no one can take that from you.

P.S. Just as a small comparison. leelazero open source project trying to replicate alpha0 in Go, took 1 month to get the same games as AG0 got in 3 hours, that with constant 1000 volunteers.
For chess it would take even more.

Training AlphaZero would take tons of time. Just like creating SF from 0. However, running it took 4 TPU, which is comparable to whats available to (rich) consumers - you can get 6-8 NVIDIA V100 which would get you similar performance.

lkaufman · Post by **lkaufman** » Thu Dec 07, 2017 6:57 am

EvgeniyZh wrote:
Milos wrote:
clumma wrote:
Milos wrote:4 hours my ass (pardon my french).
Far fewer transistors and joules were used training AlphaZero than have been used training Stockfish. You can soon rent those TPUs on Google's cloud, or apply for free access now, so stop complaining. Furthermore it's an experimental project in early days and performance is obviously not optimal, so all the 'but-but-but 30 Elo because they used SF 8 instead of SF 8.00194' sounds really dumb.

Days of alpha-beta engines have come to an abrupt end.

-Carl
Sorry, that is pretty childish rent.
Google is obviously comparing apples and oranges and again doing marketing stunt and ppl are falling for it.
Days of Alpha0 on normal hardware are years away. But keep on dreaming, no one can take that from you.

P.S. Just as a small comparison. leelazero open source project trying to replicate alpha0 in Go, took 1 month to get the same games as AG0 got in 3 hours, that with constant 1000 volunteers.
For chess it would take even more.
Training AlphaZero would take tons of time. Just like creating SF from 0. However, running it took 4 TPU, which is comparable to whats available to (rich) consumers - you can get 6-8 NVIDIA V100 which would get you similar performance.

To me this is the most informative post in the whole thread, assuming it is accurate (I know nothing about TPUs). The only reasonable comparison I can think of between the AlphaZero hardware and the Stockfish hardware is cost of equivalent machines. It doesn't matter to me how much hardware was used to reach the current level of strength for both engines, just whether the playing conditions were fair. You seem to be implying that comparable hardware to the 4 TPUs would cost no more (maybe much less?) than the sixty-four core machine used by SF. Is this correct? I'm asking to learn, not making a claim myself either way.

The other conditions were of course not "fair", but reasonable given that AlphaZero only trained for a few hours. I suppose if Stockfish used a good book, was allowed to use its time management as if the time limit were pure increment, and used the latest dev. version, the match would have been much closer, but probably (judging by the infinite win to loss ratio and the actual games) SF would have still lost. The games were amazing.

Bottom line, assuming the comparable cost claim is accurate: If Google wants to optimize the software for a few weeks and sell it, rent it, or give it away, we have a revolution in computer chess. But my guess is that they won't do this, in which case the revolution may be delayed a couple years or so.

EvgeniyZh · Post by **EvgeniyZh** » Thu Dec 07, 2017 7:21 am

lkaufman wrote:
EvgeniyZh wrote:
Milos wrote:
clumma wrote:
Milos wrote:4 hours my ass (pardon my french).
Far fewer transistors and joules were used training AlphaZero than have been used training Stockfish. You can soon rent those TPUs on Google's cloud, or apply for free access now, so stop complaining. Furthermore it's an experimental project in early days and performance is obviously not optimal, so all the 'but-but-but 30 Elo because they used SF 8 instead of SF 8.00194' sounds really dumb.

Days of alpha-beta engines have come to an abrupt end.

-Carl
Sorry, that is pretty childish rent.
Google is obviously comparing apples and oranges and again doing marketing stunt and ppl are falling for it.
Days of Alpha0 on normal hardware are years away. But keep on dreaming, no one can take that from you.

P.S. Just as a small comparison. leelazero open source project trying to replicate alpha0 in Go, took 1 month to get the same games as AG0 got in 3 hours, that with constant 1000 volunteers.
For chess it would take even more.
Training AlphaZero would take tons of time. Just like creating SF from 0. However, running it took 4 TPU, which is comparable to whats available to (rich) consumers - you can get 6-8 NVIDIA V100 which would get you similar performance.
To me this is the most informative post in the whole thread, assuming it is accurate (I know nothing about TPUs). The only reasonable comparison I can think of between the AlphaZero hardware and the Stockfish hardware is cost of equivalent machines. It doesn't matter to me how much hardware was used to reach the current level of strength for both engines, just whether the playing conditions were fair. You seem to be implying that comparable hardware to the 4 TPUs would cost no more (maybe much less?) than the sixty-four core machine used by SF. Is this correct? I'm asking to learn, not making a claim myself either way.

The info on TPUs is vague, but it's said to have ~45 TFLOPs (half precision probably). For example see here. That would mean that AlphaZero ran 180 TFLOPs system. It's believed 1080 Ti is kinda cost-optimal for DL, and you'd need 16-18 of them to match performance (you may round up to 20). That's not what you'd put at home, but many DL researchers have that amount of resources. I'd roughly approximate it around $60k for the whole thing, give or take. With next generation GPU you probably can fit the whole thing in one node.

lkaufman wrote: The other conditions were of course not "fair", but reasonable given that AlphaZero only trained for a few hours. I suppose if Stockfish used a good book, was allowed to use its time management as if the time limit were pure increment, and used the latest dev. version, the match would have been much closer, but probably (judging by the infinite win to loss ratio and the actual games) SF would have still lost. The games were amazing.

Bottom line, assuming the comparable cost claim is accurate: If Google wants to optimize the software for a few weeks and sell it, rent it, or give it away, we have a revolution in computer chess. But my guess is that they won't do this, in which case the revolution may be delayed a couple years or so.

Agreed, even if Stockfish was in his best condition, he wouldn't probably win. Also, what is more interesting, at least for me, is both engines in their best conditions.

The reaction of computer chess people here reminds me reactions of computer vision people a couple of years ago. They also argued NNs have disadvantages that wouldn't allow them to be widely used.

cdani · Post by **cdani** » Thu Dec 07, 2017 7:23 am

Also the fact that AlphaZero went only at 80Knodes/sec suggest something that experienced humans players know, that there can be detected such positions that are intrinsically safe to play without the need of verifying if the field has tactical problems.

So AlphaZero probably spends most of his nodes selecting between good positions. Instead Stockfish like engines use most of its nodes verifying that the selected moves are correct, or use them to go deeper, and of course current engines are not optimized to find the best move.

Will be very interesting to know which was the typical deep achieved by AlphaZero. I bet that much less than Stockfish. If this was like this, an engine much stronger than AlphaZero can be achieved not only by training it more time, but also by making it go deeper.

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo