AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Milos · Post by **Milos** » Wed Dec 06, 2017 7:32 pm

Rémi Coulom wrote:1080 ti is 11.3 TFLOPS:
https://www.anandtech.com/show/11172/nv ... t-week-699

A TPU is 45 TOPS:
https://arstechnica.com/information-tec ... ute-cloud/

1st gen TPU is 92 TOPS and an OP is an 8bit int multiplication.
Lets cut this crap of comparing apples and oranges. Please take a look at:
https://arxiv.org/abs/1704.04760

The actual comparison (not apples and oranges stuff you mention) you can see in Table 6 where typical ML application are compared (MLP and CNN).
Factor between first gen TPU and K80 (that is 3-5x faster for ML compared to 1080) is between 15 and 60 averaging around 25x.

Rémi Coulom · Post by **Rémi Coulom** » Wed Dec 06, 2017 7:51 pm

Milos wrote:
Rémi Coulom wrote:1080 ti is 11.3 TFLOPS:
https://www.anandtech.com/show/11172/nv ... t-week-699

A TPU is 45 TOPS:
https://arstechnica.com/information-tec ... ute-cloud/
1st gen TPU is 92 TOPS and an OP is an 8bit int multiplication.
Lets cut this crap of comparing apples and oranges. Please take a look at:
https://arxiv.org/abs/1704.04760

The actual comparison (not apples and oranges stuff you mention) you can see in Table 6 where typical ML application are compared (MLP and CNN).
Factor between first gen TPU and K80 (that is 3-5x faster for ML compared to 1080) is between 15 and 60 averaging around 25x.

The GTX 1080 should be faster than a K80. For instance, this is a deep learning benchmark where it is 4x faster:
https://medium.com/initialized-capital/ ... bd85fe5d58
They have roughly the same number of cores, but the clock speed of the 1080 is 3x the clock speed of the K80. 16nm vs 28 nm technology. The 1080 is definitely faster.

The reason I used 5x in my initial formula is that I believed you meant in your message that a 1080 is 5x slower than a TPU (5x slower than a K80 cannot be correct).

Anyway, whether a TPU is 5x or 10x faster than a 1080 does not change much to the fact that the experiment of DeepMind can be replicated in a few months of distributed computation with ~100 participants, which should be less than the effort that was used by Stockfish so far.

Milos · Post by **Milos** » Wed Dec 06, 2017 8:05 pm

Rémi Coulom wrote:Anyway, whether a TPU is 5x or 10x faster than a 1080 does not change much to the fact that the experiment of DeepMind can be replicated in a few months of distributed computation with ~100 participants, which should be less than the effort that was used by Stockfish so far.

It took leelazero 1 month to get the same games as AG0 got in 3 hours, that with constant 1000 volunteers.
What makes you think you could do the same in chess with only 100 participants?
Minimum time to train network to SF8 level would be at least a year with constant 100 volunteers.
And in terms of power burned I really don't think it wouldn't be anywhere near to fishtest but much higher. Power per core of modern CPU is 10-15W. 1080 is like 250W.
Most of ppl in fishtest donate just a few cores and most ppl don't have 10 series GTX cards but older which are far less powerful and far more power hungry.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 8:30 pm

shrapnel wrote:This really seems revolutionary !
Beating Stockfish 10-0 is no joke.
I wonder when such a Program would be made available to customers at a reasonable price ?

Never.
It will go the Deep Blue path, serving just commercial interests.
And is of comparable strength as Deep Blue then, that is, far from the top.
I don't know why it is so difficult to understand it is all hardware.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 8:36 pm

Rémi Coulom wrote:
Milos wrote:
Rémi Coulom wrote:1080 ti is 11.3 TFLOPS:
https://www.anandtech.com/show/11172/nv ... t-week-699

A TPU is 45 TOPS:
https://arstechnica.com/information-tec ... ute-cloud/
1st gen TPU is 92 TOPS and an OP is an 8bit int multiplication.
Lets cut this crap of comparing apples and oranges. Please take a look at:
https://arxiv.org/abs/1704.04760

The actual comparison (not apples and oranges stuff you mention) you can see in Table 6 where typical ML application are compared (MLP and CNN).
Factor between first gen TPU and K80 (that is 3-5x faster for ML compared to 1080) is between 15 and 60 averaging around 25x.
The GTX 1080 should be faster than a K80. For instance, this is a deep learning benchmark where it is 4x faster:
https://medium.com/initialized-capital/ ... bd85fe5d58
They have roughly the same number of cores, but the clock speed of the 1080 is 3x the clock speed of the K80. 16nm vs 28 nm technology. The 1080 is definitely faster.

The reason I used 5x in my initial formula is that I believed you meant in your message that a 1080 is 5x slower than a TPU (5x slower than a K80 cannot be correct).

Anyway, whether a TPU is 5x or 10x faster than a 1080 does not change much to the fact that the experiment of DeepMind can be replicated in a few months of distributed computation with ~100 participants, which should be less than the effort that was used by Stockfish so far.

Only thing is they are still tuning at a much lower level, quite probably around 2900 or even lower.
It will not be that easy going forward, as optimal lines get subtler and subtler.
Stockfish also averaged around 150 elo in the first year.
At that level, it is easy, let's see what they do from now on, and my prediction is: very little.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 8:40 pm

clumma wrote:
Milos wrote:4 hours my ass (pardon my french).
Far fewer transistors and joules were used training AlphaZero than have been used training Stockfish. You can soon rent those TPUs on Google's cloud, or apply for free access now, so stop complaining. Furthermore it's an experimental project in early days and performance is obviously not optimal, so all the 'but-but-but 30 Elo because they used SF 8 instead of SF 8.00194' sounds really dumb.

Days of alpha-beta engines have come to an abrupt end.

-Carl

Oops, are not they doing alpha-beta too?
There is a single approach to playing chess, picking the best move, and whether you call it alpha-beta, Monte Carlo or Las Vegas does not matter at all.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 8:45 pm

jdart wrote:Good point re the book - Alphazero effevtibely has one. But it is still not a small achievement wiining against SF, even with unequal conditions. But many of us would like to see a more equal test.

And I have been wondering why in most games Alpha takes very early advantage.
Stockfish opening play seems normal for its level, but that thing plays like a beast.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 8:55 pm

MikeGL wrote:
Lyudmil Tsvetkov wrote:
MikeGL wrote:
clumma wrote:
Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage
... but that has got to be near saturation.

-Carl
Excellent point.

I mean at that hardware, there won't be any fluctuation on the best candidate move by SF8 anyway.
Alpha hardware equivalent was somewhere 1024 standard cores.
How 1024 cores compare with 64 cores?
How scientific is that.

I don't know what saturation you are talking about, from what I read, without fully understanding it, the TPUs are a very different architecture and quite differently affected by general computer chess concepts.
How can you make up such claims when there are not enough data on the PDF file?
TPU instruction set and benchmark was not properly published.
Was only claimed in PDF report that SF8 was on 64 threads (but no clock speed). Was discussed years ago on this forum that clockspeed, including those of the buses,
would trump number of cores.

Would choose a 1-core 4.0 Ghz over 8-cores running at 2.0 GHz with buggy SMP implementation of engine.

I did not read carefully the whole pdf, because from the start I saw it is unreadable.
How can they use 'The modern chess instructor' by Steinitz to improve Alpha?
I read however a number of sites on the Internet, from where it became clear that:
- TPUs are the equivalent of at least 256 normal cores
- TPUs are highly efficient for calculations, and mostly lack the diminishing returns problem of SMP implementations; so that, actually, while SF has been using 64 cores and losing probably half of that hardware to SMP inefficiencies with larger number of cores, Alpha lost almost nothing from its tremendous hardware

So, that, in reality, the hardware difference is not 16/1, as I thought initially, but more like 30/1.
Add to this the early opening advantage Alpha gets due to the simulated book, and conducting the test has been fully meaningless.

Alpha would play not stronger than 1850 on a single core.
Why would I care for such an engine?

jhellis3 · Post by **jhellis3** » Wed Dec 06, 2017 8:57 pm

Why would I care for such an engine?

The better question is why the engine (or anyone) would care whether you care.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 9:00 pm

jhellis3 wrote:As has been mentioned previously, one can not really make direct core count comparisons in this case.

The most "fair" metric I can think of using is system power consumption, and I would guess that SF was at a bit of a disadvantage in this regard. Regardless, the writing is clearly on the wall.....

Your last sentence is enigmatic, to say the least.
What is on the wall?

What kind of metric is system power consumption, when TPUs are geared towards extremely low power consumption?

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo