AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Jhoravi · Post by **Jhoravi** » Thu Dec 07, 2017 2:44 pm

Do you all think that AlphaZero from scratch only knows the rules of chess and nothing more? How about each piece values like 1 for pawn and 3 for knight etc? I would be so amazed if it learned all the pieces values from self play.

EvgeniyZh · Post by **EvgeniyZh** » Thu Dec 07, 2017 2:45 pm

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:
clumma wrote:
Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage
That comparison is not straightforward, but this claim does not seem to be true. SF had 64 threads. I'm not up on the latest scaling behavior of the engine but that has got to be near saturation.

-Carl
From what I gleaned from hardware comparisons, the advantage is 16/1.
Why would one want to run a similar very unfair match?
Only one thing comes to mind: that the company will want to advertise its colossal breakthrough with TPUs and artificial intelligence and then sell its products.

But then, the achievement is not there.
The fact that Google has created a chess playing entity that crushes SF is notable (and fascinating).

TPUs are not for sale, and (at the moment) are applied only to Googles deep learning and research projects,
except when Google donates them to research for free.

https://techcrunch.com/2017/05/17/the-t ... cientists/
What would be the score between SF on 64 cores and SF on 1024 cores out of 100 games?
You think the bigger-hardware SF would score less than 64 points?
I guess at least 80.

So what is so new?
They applied some big hardware, that is all.
The real strength of Alpha is 2850, so around spot 97 or so among engines.
97 is not such a bad achievement, after all.
I doubt if SF on 1024 cores is going to score even 50%
Maybe after some point more cores are counter productive for stockfish.

I also doubt if it is possible to get at least 80 points against stockfish with 64 cores at 1 minute per move.
Why not?
How much would SF 16-cores vs SF single core score, that is easily reproducible.
The experts claim the TPUs lack any SMP inefficiencies.
If you give the engines 10 minutes per move then I doubt so they play almost perfect chess then I guess that you will get less than 80% and 1 minute per move for 64 cores is probably stronger than 10 minutes per move for the 1 core.
Actually, the hardware advantage + simulated opening book(very important, as SF lost most games already in the opening), was close to 50/1, so how will SF 50 cores fare against SF single core?

Certainly somewhere 95% or so.

Are you aware that the dependency of TTD vs. core number is sublinear? 64 cores to 1 is not like 4096 to 64. Moreover even if it were, the dependency of strength of play vs TTD is also sublinear, and, certainly, is practically bounded. It is actually demonstrated in paper, page 7. Did you read it?

Robert Pope · Post by **Robert Pope** » Thu Dec 07, 2017 4:46 pm

Jhoravi wrote:Do you all think that AlphaZero from scratch only knows the rules of chess and nothing more? How about each piece values like 1 for pawn and 3 for knight etc? I would be so amazed if it learned all the pieces values from self play.

Yes. That's the whole point of what makes this so incredible. The fact that they can take nothing more than the rules and definitions for game outcomes and create a superhuman player makes this something that can be applied far outside the chess domain. Except, it isn't even bothering to learn piece values. It is learning which moves are better in different positions without depending on crutches like that.

Also, learning something as simple as piece values from self play is no big deal and has been easily accomplished by other machine learning programs.

lkaufman · Post by **lkaufman** » Thu Dec 07, 2017 5:08 pm

Lyudmil Tsvetkov wrote:
lkaufman wrote:
EvgeniyZh wrote:
Milos wrote:
clumma wrote:
Milos wrote:4 hours my ass (pardon my french).
Far fewer transistors and joules were used training AlphaZero than have been used training Stockfish. You can soon rent those TPUs on Google's cloud, or apply for free access now, so stop complaining. Furthermore it's an experimental project in early days and performance is obviously not optimal, so all the 'but-but-but 30 Elo because they used SF 8 instead of SF 8.00194' sounds really dumb.

Days of alpha-beta engines have come to an abrupt end.

-Carl
Sorry, that is pretty childish rent.
Google is obviously comparing apples and oranges and again doing marketing stunt and ppl are falling for it.
Days of Alpha0 on normal hardware are years away. But keep on dreaming, no one can take that from you.

P.S. Just as a small comparison. leelazero open source project trying to replicate alpha0 in Go, took 1 month to get the same games as AG0 got in 3 hours, that with constant 1000 volunteers.
For chess it would take even more.
Training AlphaZero would take tons of time. Just like creating SF from 0. However, running it took 4 TPU, which is comparable to whats available to (rich) consumers - you can get 6-8 NVIDIA V100 which would get you similar performance.
To me this is the most informative post in the whole thread, assuming it is accurate (I know nothing about TPUs). The only reasonable comparison I can think of between the AlphaZero hardware and the Stockfish hardware is cost of equivalent machines. It doesn't matter to me how much hardware was used to reach the current level of strength for both engines, just whether the playing conditions were fair. You seem to be implying that comparable hardware to the 4 TPUs would cost no more (maybe much less?) than the sixty-four core machine used by SF. Is this correct? I'm asking to learn, not making a claim myself either way.

The other conditions were of course not "fair", but reasonable given that AlphaZero only trained for a few hours. I suppose if Stockfish used a good book, was allowed to use its time management as if the time limit were pure increment, and used the latest dev. version, the match would have been much closer, but probably (judging by the infinite win to loss ratio and the actual games) SF would have still lost. The games were amazing.

Bottom line, assuming the comparable cost claim is accurate: If Google wants to optimize the software for a few weeks and sell it, rent it, or give it away, we have a revolution in computer chess. But my guess is that they won't do this, in which case the revolution may be delayed a couple years or so.
Larry, what kind of revolution, this is 30/1 hardware advantage.
Alpha is currently at 2850 level.

Based on the estimated $60k price for equivalent hardware vs. maybe $20k for the 64 core SF machine (my guess) it would be 3 to 1, not 30. The actual hardware used by Alpha would be useless for SF, so you can't compare the hardware any other way than by price, I think. It sounds like the cost of the type of hardware needed for Alpha is expected to plummet while the cost of normal CPUs just trends slightly lower. We already see the same thing in GO. Leela (top GO engine for pc, and free like SF) plays a stone or so stronger with even a cheap GPU than without one. So while Alpha might only play at 2850 level on your laptop, it might be super-strong in a year or so on something many people could afford. But if Google doesn't release it, that won't happen.

tmokonen · Post by **tmokonen** » Thu Dec 07, 2017 6:17 pm

Lyudmil Tsvetkov wrote:Eelco, how can you buy into the SCAM too?
The hardware advantage was 50/1.
It plays 1850-elo chess on a single core.

Above diagram is already way way won for black; Stockfish blundered already in the opening with Nce5, this is already lost.

Alpha beating me? Gosh, I will shred it to pieces.
It understands absolutely nothing of closed positions, no such were encountered in the sample.

It is all about the hardware, 2 or 3 beautiful games, with the d4-e5-f6 chain outperforming a whole black minor piece, one great attack on the bare SF king and one more, all the rest is just exceding computations.
Nothing special about its eval.

Just another crappy, misinformed, false bravado post from a guy who quit his job to pursue the quixotic dream of finding the perfect old-school end point evaluation for an alpha beta searcher. He can't accept the fact that his years of painstaking effort have been rendered moot by a project that was just a "meh, let's spend a few hours and see what happens" lark by the team that already conquered Go, a much more complex game than chess.

Milos · Post by **Milos** » Thu Dec 07, 2017 6:27 pm

EvgeniyZh wrote:The info on TPUs is vague, but it's said to have ~45 TFLOPs (half precision probably). For example see here. That would mean that AlphaZero ran 180 TFLOPs system. It's believed 1080 Ti is kinda cost-optimal for DL, and you'd need 16-18 of them to match performance (you may round up to 20). That's not what you'd put at home, but many DL researchers have that amount of resources. I'd roughly approximate it around $60k for the whole thing, give or take. With next generation GPU you probably can fit the whole thing in one node.
lkaufman wrote: The other conditions were of course not "fair", but reasonable given that AlphaZero only trained for a few hours. I suppose if Stockfish used a good book, was allowed to use its time management as if the time limit were pure increment, and used the latest dev. version, the match would have been much closer, but probably (judging by the infinite win to loss ratio and the actual games) SF would have still lost. The games were amazing.

Bottom line, assuming the comparable cost claim is accurate: If Google wants to optimize the software for a few weeks and sell it, rent it, or give it away, we have a revolution in computer chess. But my guess is that they won't do this, in which case the revolution may be delayed a couple years or so.

First you have to understand what TPU is. There is enough material on that, published by no one else but Google.
https://arxiv.org/abs/1704.04760
Second it is not 45 TFLOPs but 92 TOPS and that is first generation TPU. They don't say explicitly in the paper which generation TPU they used for inference (they say it just for training) but logic kind of tells us second generation is more probable.
Second generation TPUs performance is 180 TOPS.
It is int8 multiplication not single or double floating point precision operations you are used to from common GPUs and NVIDIA in general and it is certainly not tensor FLOPS (stupid marketing term by NVIDIA that has zero meaning in reality).
V100 has 15 TFLOPS single precision, that is the most you can get if you use single precision floating point as a replacement for integer multiplication. So you would need 6 V100 for one first generation TPU, and 12 for second generation one.
Alpha0 used 4 TPUs for running games, so at best 24 V100, at worst 48 V100.
V100 will at best cost 10k$, or 250k$ of half a million bucks just to run alpha0, and you think there would be chess enthusiasts to afford it???

And give me a break with theoretical GP102 performance (1080Ti). I work with them for ML and that is pure BS, so much BS that NVIDIA actually never published the figure, but instead what ppl compute as num_cores x frequency x 2 which is totally detached from reality.
In reality if you run int multiplications on it you'd see the performance is not even 1 TOPS (for int multiplication).
You think NVIDIA is so stupid to sell V100 for >10k$ offering almost the same performance as 1080Ti that costs 600$???

IanO · Post by **IanO** » Thu Dec 07, 2017 6:30 pm

The computer shogi community is expressing similar concerns about the selection of opponent (Elmo), engine configuration, and match methodology:

Some concerns on the matching conditions between AlphaZero and Shogi engine

December 6, 2017

After the publication of the paper (D. Silver et. al. "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", arXiv:1712.01815), there appeared a few concerns from the community of computer shogi programmers on the matching conditions between AlphaZero and Shogi engine “elmo”. Here I summarize the points with some explanations. (Informations will be updated if error is found).

1. Resignation point seems too narrow. In the recent software, the evaluation tends to give larger value compared with the chess programs. Many people feel that -900 centipawns is too small for Shogi programs. I guess that the acceptable value would be -3000 to -5000. In the official matches such as World Computer Shogi Competition (http://www2.computer-shogi.org/index_e.html), they do not set the resignation point and wait until the program resigns. After 256 plies, the game is judged as "draw" even if the evaluation is one-sided.

2. It is strange to set "EnteringKingRule" to "NoEnteringKing". In the recent matches between shogi software, Entering King frequently occurs and the treatment is critical to the match results. When both kings enter the the other's territory, Yaneuraou counts the number of pieces and declare to win if it has enough pieces. It is not clear if AlphaZero has this functionality. I guess that it will be preferrable to be set to the default "CSARule27".

3. Hash size may be too low and tricky. In YaneuraOu 2017 Early, there are two setting on Hash size. One is "Hash" which is set to 16 MB by default and "USI Hash" whose default value 1024MB. In YaneuraOu, the latter value is not used and the former one is important. If "Hash" is kept to the default value, I observe that program becomes very weak. In the matching condition (35MNode per move), even 1GB may be too low. It will be more appropriate if it is set to bigger value.

Finally I would like to mention that 2017 is a dog year for shogi engines and we have plenty of programs which are much stronger than elmo. For instance, the winner program "Heisei shogi gassen ponpoko" ("ponpoko" in short) in Shogi Denno Tounament (http://denou.jp/tournament2017/), overrates elmo by R150. This program is available at https://github.com/nodchip/hakubishin-/releases as "tanuki-sdt5-2017-11-16". It is also known that Apery_sdt5 has even stronger evaluation file (available at https://t.co/S7q7XlW4dG), (R200 stronger than elmo). Currently the strongest evaluation file is "aperypaq" which is an improvement of Apery_sdt5 (available at http://qhapaq.hatenablog.com/entry/2017/11/28/195426). (R250 stronger than elmo). These should be combined with YanuraOu. I hope that the authors may test these programs before declaring AlphaZero beats currently available shogi programs.

Source: http://www.uuunuuun.com/single-post/201 ... ogi-engine

Milos · Post by **Milos** » Thu Dec 07, 2017 6:32 pm

EvgeniyZh wrote:64 cores to 1 is not like 4096 to 64. Moreover even if it were, the dependency of strength of play vs TTD is also sublinear, and, certainly, is practically bounded. It is actually demonstrated in paper, page 7. Did you read it?

Figure 2 is completely bogus. If wish Google actually cited that reference that shows SFs Elo performance increase when going from 10s to 1min/move of under 20Elo.

EvgeniyZh · Post by **EvgeniyZh** » Thu Dec 07, 2017 7:38 pm

Milos wrote: First you have to understand what TPU is. There is enough material on that, published by no one else but Google.
https://arxiv.org/abs/1704.04760

I do know enough about TPU, and even developed similar system by myself

Milos wrote: Second it is not 45 TFLOPs but 92 TOPS and that is first generation TPU.

Exactly, while first-gen works with INT8, second gen works with floating point, and give 180 TFLOPS per 4 TPU pod: https://www.blog.google/topics/google-c ... -learning/

Milos wrote: Second generation TPUs performance is 180 TOPS.

TFLOPS, per pod (4 TPUs)

Milos wrote:It is int8 multiplication not single or double floating point precision operations you are used to from common GPUs and NVIDIA in general

NVIDIA has GPUs supporting half precision FP and INT8.

So basically every word you are writing is misleading BS which has nothing to do with truth, since you either not ready to spend two minutes googling (pun intended) or just jealous and try to mislead others.

EvgeniyZh · Post by **EvgeniyZh** » Thu Dec 07, 2017 7:39 pm

Milos wrote:
EvgeniyZh wrote:64 cores to 1 is not like 4096 to 64. Moreover even if it were, the dependency of strength of play vs TTD is also sublinear, and, certainly, is practically bounded. It is actually demonstrated in paper, page 7. Did you read it?
Figure 2 is completely bogus. If wish Google actually cited that reference that shows SFs Elo performance increase when going from 10s to 1min/move of under 20Elo.

So your are playing expert and even don't understand meaning of relative ELO?

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo