Daniel Shawul wrote:mhull wrote:Daniel Shawul wrote:George Tsavdaris wrote:Milos wrote:
A0 was not better than SF8 even with 4TPUs. Rigged tests and and cherry picked results mean nothing.
Deepmind paper states that out from 1300 games A0 played with Stockfish the result was +318 =958 -24 in favor of A0 so how A0 is not much much better than SF??
Do you accuse Deepmind, a research company(that brought revolution in GO world and computer GO world) of fabricating the data on its papers?
If they didn't give us the 3 papers and didn't change the GO world with their astonishing results, we would all believe their claims to be nonsensical, but now there is little doubt that they are not 100% genuine.
I think Deepmind should have used the same 64 cores cpu they used for Stockfish. There will not be a question of fairnes had they done that. That it can't run or is not designed for the CPU is bullshit. They get about 180 TFlops on the 4 TPUs compared to 64 cores which would be (assuming 4 core i7 gives 70 Gflops ) would be 1 TFlops. That is like a 180x hardware advantage then... I could be completely off though but you get my poiint i.e best way would be to use those 64 cores for A0 for the match. TensorFlow training and inference all can run on the CPU.
So we're back to the "uniform platform" school of computer chess competition again. However, limiting A0 to 64 scalar CPUs is completely arbitrary and biased toward scalar-optimized chess projects. Demanding a non-scalar optimized project to un-optimize itself "to make it fair" is not fair.
They are not competing in CCC are they? From the perspective of scientific transparency which i belive is their goal, they should make it unambigiously clear how the result is achieved.
Issues I have with their paper:
a) MCTS sucks in tactics and there is no single mention of this in the paper. People are starting to understand the severness of it with L0 now.
MCTS eventually doesn't suck once the NN gets smart enough. Your concern seems to be with "non-equivalent path-length to solution", a.k.a. non-uniform hardware competition.
My view is that MCTS is neutral, human-intervention-wise. IOW, A0/L0 don't rely on human intelligence in search strategy, which many nuances could be tried. But then A0 wouldn't be learning everything. Some of the secret knowledge would be with human programmers.
Daniel Shawul wrote:
b) Cherrypicing. Given (a), I would imagine a 3500 elo stockfish will win once every 100 games. Even though they did 1000 games against it, it is hard to find a consecutive 100 games with a non-stockfish win.
I assume you meant "stockfish win". Yes, they failed to include stockfish victories which may have revealed A0 weaknesses.
Daniel Shawul wrote:
c) Minimum hardware to get A0 performance is now atleast 4-TPUs which is out of reach of many people or one has to use 1 year + 1 month time control to get 3500 elo performance.
True, but so were multi-million dollar super computers in the 1980s and '90s. But the programmers of micros were eager to compete against these because they were actually competitive, as the CCCs proved.
Daniel Shawul wrote:
c) Hardware differences. It has become very clear to me this result is achieved via massive hardware acceleration of a very slow eval. Theoretically Deep Blue could also have acieved this result with their FGPA. Admittedly, their approach is cost-effective given the future is cheap manycore architectures like GPUs.
d) A minor issue is scalability of alpha-beta engines is not that good.
Daniel
And scalability is a limitation of scalar projects, but a boon to non-scalar ones. I therefore think your main objection is the uniform platform objection which is what I call the "path-length-to-ELO" objection. In your view, the PLTE is longer for A0/L0 vis-a-vis Stockfish and other top a/b searcher projects. Therefore the comparisons are ignoring the computing path-length to ELO acheivement.
IOW, it was unfair to compare Cray Blitz to a Mephisto project back in the day.