LC0 on 43 cores had a ~2700 CCRL ELO performance.

Milos · Post by **Milos** » Wed Apr 18, 2018 1:27 pm

George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.

Go is not chess, so Google go results have very little if nothing to do with chess results.
No, I accuse Google, a multi-billion advertising monopolist well known for unethical behaviour for creating an advertising leaflet that they leaked as a quasi scientific paper (of garbage scientific quality basically that will never ever be published anywhere) in order to advertise their TPU cloud service (paper was nothing but a demonstrator of its massive TPU power for ML training). And rigging "match" conditions, cherry-picking results, even outright faking it wouldn't be at all unusual for someone with such a reputation as Google.
Deepmind does how its bosses and mangers tell them, they are Google, they are not some independent research company they used to be. Believing they still are, makes you pretty naive if not outright delusional.

duncan · Post by **duncan** » Wed Apr 18, 2018 2:10 pm

Milos wrote:
George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.
Go is not chess, so Google go results have very little if nothing to do with chess results.
No, I accuse Google, a multi-billion advertising monopolist well known for unethical behaviour for creating an advertising leaflet that they leaked as a quasi scientific paper (of garbage scientific quality basically that will never ever be published anywhere) in order to advertise their TPU cloud service (paper was nothing but a demonstrator of its massive TPU power for ML training). And rigging "match" conditions, cherry-picking results, even outright faking it wouldn't be at all unusual for someone with such a reputation as Google.
Deepmind does how its bosses and mangers tell them, they are Google, they are not some independent research company they used to be. Believing they still are, makes you pretty naive if not outright delusional.

so no revolutionary new ideas of Nn in the paper then.?

Daniel Shawul · Post by **Daniel Shawul** » Wed Apr 18, 2018 5:46 pm

George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.

I think Deepmind should have used the same 64 cores cpu they used for Stockfish. There will not be a question of fairnes had they done that. That it can't run or is not designed for the CPU is bullshit. They get about 180 TFlops on the 4 TPUs compared to 64 cores which would be (assuming 4 core i7 gives 70 Gflops ) would be 1 TFlops. That is like a 180x hardware advantage then... I could be completely off though but you get my poiint i.e best way would be to use those 64 cores for A0 for the match. TensorFlow training and inference all can run on the CPU.

mhull · Post by **mhull** » Wed Apr 18, 2018 5:55 pm

Daniel Shawul wrote:
George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.
I think Deepmind should have used the same 64 cores cpu they used for Stockfish. There will not be a question of fairnes had they done that. That it can't run or is not designed for the CPU is bullshit. They get about 180 TFlops on the 4 TPUs compared to 64 cores which would be (assuming 4 core i7 gives 70 Gflops ) would be 1 TFlops. That is like a 180x hardware advantage then... I could be completely off though but you get my poiint i.e best way would be to use those 64 cores for A0 for the match. TensorFlow training and inference all can run on the CPU.

So we're back to the "uniform platform" school of computer chess competition. However, limiting A0 to 64 scalar CPUs is completely arbitrary and biased toward scalar-optimized chess projects. Demanding a non-scalar optimized project to un-optimize itself "to make it fair" is not fair.

It would be as if in the 1980's , we demanded a fair contest only if Cray Blitz would run on the same hardware as Mephisto <insert version name>.

I don't think so.

Daniel Shawul · Post by **Daniel Shawul** » Wed Apr 18, 2018 6:11 pm

mhull wrote:
Daniel Shawul wrote:
George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.
I think Deepmind should have used the same 64 cores cpu they used for Stockfish. There will not be a question of fairnes had they done that. That it can't run or is not designed for the CPU is bullshit. They get about 180 TFlops on the 4 TPUs compared to 64 cores which would be (assuming 4 core i7 gives 70 Gflops ) would be 1 TFlops. That is like a 180x hardware advantage then... I could be completely off though but you get my poiint i.e best way would be to use those 64 cores for A0 for the match. TensorFlow training and inference all can run on the CPU.
So we're back to the "uniform platform" school of computer chess competition again. However, limiting A0 to 64 scalar CPUs is completely arbitrary and biased toward scalar-optimized chess projects. Demanding a non-scalar optimized project to un-optimize itself "to make it fair" is not fair.

They are not competing in CCC are they? From the perspective of scientific transparency which i belive is their goal, they should make it unambigiously clear how the result is achieved.

Issues I have with their paper:

a) MCTS sucks in tactics and there is no single mention of this in the paper. People are starting to understand the severness of it with L0 now.

b) Cherrypicing. Given (a), I would imagine a 3500 elo stockfish will win once every 100 games. Even though they did 1000 games against it, it is hard to find a consecutive 100 games with a non-stockfish win.

c) Minimum hardware to get A0 performance is now atleast 4-TPUs which is out of reach of many people or one has to use 1 year + 1 month time control to get 3500 elo performance.

c) Hardware differences. It has become very clear to me this result is achieved via massive hardware acceleration of a very slow eval. Theoretically Deep Blue could also have acieved this result with their FGPA. Admittedly, their approach is cost-effective given the future is cheap manycore architectures like GPUs.

d) A minor issue is scalability of alpha-beta engines is not that good.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Wed Apr 18, 2018 6:27 pm

George Tsavdaris wrote:
Werewolf wrote:
George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.
It was a crippled Stockfish though
Crippled in what sense?
It had 64 cores(i can't say that this is crippled) and equal time control with its opponent.
Only the hash size was stupidly low but how much could affect it?

Though I don't belive Stockfish is crippled directly, the advantage that the 4-TPUs gives to A0 is right in our face... As demonstrated by LeelaZero in TCEC's 44 core machine, you are able to get decent performance on CPUs so there is no excuse for not using the same hardware.
They could have used the 64 cores that they used for Stockfish for AlphaZero too, but they chose to get a 180x advantage by running A0 on the 4 TPUs than the 64 cores that Stockfish used, do you think this is fair ?

mhull · Post by **mhull** » Wed Apr 18, 2018 6:35 pm

Daniel Shawul wrote:
mhull wrote:
Daniel Shawul wrote:
George Tsavdaris wrote:
Milos wrote: A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??

Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.
I think Deepmind should have used the same 64 cores cpu they used for Stockfish. There will not be a question of fairnes had they done that. That it can't run or is not designed for the CPU is bullshit. They get about 180 TFlops on the 4 TPUs compared to 64 cores which would be (assuming 4 core i7 gives 70 Gflops ) would be 1 TFlops. That is like a 180x hardware advantage then... I could be completely off though but you get my poiint i.e best way would be to use those 64 cores for A0 for the match. TensorFlow training and inference all can run on the CPU.
So we're back to the "uniform platform" school of computer chess competition again. However, limiting A0 to 64 scalar CPUs is completely arbitrary and biased toward scalar-optimized chess projects. Demanding a non-scalar optimized project to un-optimize itself "to make it fair" is not fair.
They are not competing in CCC are they? From the perspective of scientific transparency which i belive is their goal, they should make it unambigiously clear how the result is achieved.

Issues I have with their paper:

a) MCTS sucks in tactics and there is no single mention of this in the paper. People are starting to understand the severness of it with L0 now.

MCTS eventually doesn't suck once the NN gets smart enough. Your concern seems to be with "non-equivalent path-length to solution", a.k.a. non-uniform hardware competition.

My view is that MCTS is neutral, human-intervention-wise. IOW, A0/L0 don't rely on human intelligence in search strategy, which many nuances could be tried. But then A0 wouldn't be learning everything. Some of the secret knowledge would be with human programmers.

Daniel Shawul wrote: b) Cherrypicing. Given (a), I would imagine a 3500 elo stockfish will win once every 100 games. Even though they did 1000 games against it, it is hard to find a consecutive 100 games with a non-stockfish win.

I assume you meant "stockfish win". Yes, they failed to include stockfish victories which may have revealed A0 weaknesses.

Daniel Shawul wrote: c) Minimum hardware to get A0 performance is now atleast 4-TPUs which is out of reach of many people or one has to use 1 year + 1 month time control to get 3500 elo performance.

True, but so were multi-million dollar super computers in the 1980s and '90s. But the programmers of micros were eager to compete against these because they were actually competitive, as the CCCs proved.

Daniel Shawul wrote: c) Hardware differences. It has become very clear to me this result is achieved via massive hardware acceleration of a very slow eval. Theoretically Deep Blue could also have acieved this result with their FGPA. Admittedly, their approach is cost-effective given the future is cheap manycore architectures like GPUs.

d) A minor issue is scalability of alpha-beta engines is not that good.

Daniel

And scalability is a limitation of scalar projects, but a boon to non-scalar ones. I therefore think your main objection is the uniform platform objection which is what I call the "path-length-to-ELO" objection. In your view, the PLTE is longer for A0/L0 vis-a-vis Stockfish and other top a/b searcher projects. Therefore the comparisons are ignoring the computing path-length to ELO acheivement.

IOW, it was unfair to compare Cray Blitz to a Mephisto project back in the day.

Ras · Post by **Ras** » Wed Apr 18, 2018 7:12 pm

Daniel Shawul wrote:They could have used the 64 cores that they used for Stockfish for AlphaZero too, but they chose to get a 180x advantage by running A0 on the 4 TPUs than the 64 cores that Stockfish used, do you think this is fair ?

Of course it is fair because CPUs are good hardware for if/then/else-programs, which an NN isn't, and TPUs are good for NNs, which Stockfish isn't. Besides, the GFLOPs are totally misleading because GPUs only can deliver them under specific circumstances: the same operation on lots of different data.

Besides, the total energy consumption of both systems was comparable, so it's not like Google threw in a nuclear power plant plus a skyscraper full of computers.

Btw., let's take the absurdity a bit further: why even give a full x86 core? Hey, I can throw in a Cortex-M4, which takes less power than your optical USB mouse alone does, and then let's see how Stockfish does on that hardware: not at all because it wouldn't even run. Cool, then my CT800 engine is better than Stockfish because Stockfish is only cheating with these fat x86 cores.

Pio · Post by **Pio** » Wed Apr 18, 2018 8:23 pm

It would have been interesting if They had swapped hardware. I wonder how good stockfish would have been on TPU:s

Daniel Shawul · Post by **Daniel Shawul** » Wed Apr 18, 2018 9:07 pm

Ras wrote:
Daniel Shawul wrote:They could have used the 64 cores that they used for Stockfish for AlphaZero too, but they chose to get a 180x advantage by running A0 on the 4 TPUs than the 64 cores that Stockfish used, do you think this is fair ?
Of course it is fair because CPUs are good hardware for if/then/else-programs, which an NN isn't, and TPUs are good for NNs, which Stockfish isn't. Besides, the GFLOPs are totally misleading because GPUs only can deliver them under specific circumstances: the same operation on lots of different data.

Besides, the total energy consumption of both systems was comparable, so it's not like Google threw in a nuclear power plant plus a skyscraper full of computers.

Btw., let's take the absurdity a bit further: why even give a full x86 core? Hey, I can throw in a Cortex-M4, which takes less power than your optical USB mouse alone does, and then let's see how Stockfish does on that hardware: not at all because it wouldn't even run. Cool, then my CT800 engine is better than Stockfish because Stockfish is only cheating with these fat x86 cores.

Ok lets assume A0 has to run on 4-TPUs for some unknown reason to me, then to be fair (based on flops) they have to give stockfish 180x64= 11520 cores not just 64 ...
Ofcourse we all know alphaBeta doesn't scale well but you get the point that with that line of argument, one can say I will use 40-TPUs , cause that is what i am designed for, you can use 115,200 cores if you want to... and so on ....

Both GPU and TPU numbers I used for my calculations are theoretical flops so don't see why that matters.

The result only makes sense to me in a performance per dollar/watt comparison

LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.

Re: LC0 on 43 cores had a ~2700 CCRL ELO performance.