AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 1:03 pm

chessmobile wrote:Seems to play a mean game of chess. The endgames is where it excels. Many games looked equal to the naked eye but Alpha went on to win. If this thing follows the Go project then expect in a few months a monster that will beat it's current version quite easily.

Again, 80% of games were already decided in the early opening.
Due to the opening book Alpha unfairly used.
In terms of evaluation, chess is 1000 times more complex than Go, so we will simply never see big advances with this approach.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 1:10 pm

Eelco de Groot wrote:
chessmobile wrote:Seems to play a mean game of chess. The endgames is where it excels. Many games looked equal to the naked eye but Alpha went on to win. If this thing follows the Go project then expect in a few months a monster that will beat it's current version quite easily.
I am just looking at the first game where Alpha Zero wins with Black, but it seems to me that it excels specifically in showing big holes in Stockfish'eval That that is in the endgame is not a big surprise.

For instance this position from that game is lost, but even Kaissa needs a long time to see that and it knows a little bit about the power of the bishop pair, but apparently is still blind and this is after going backwards from about move 40...

[D]5bk1/r5p1/7p/2p1N3/4PP2/1P1P2Pb/2P4P/6RK w - -

Engine: Kaissa HT (512 MB)
by T. Romstad, M. Costalba, J. Kiiski, G. Linscott

23/34 0:01 -0.13 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 Ke6 43.b5 Bg4 44.d4 c4
45.Rb4 Kd6 46.Rxc4 Rxb5 47.Rb4 Bd7
48.h4 (16.663.353) 11013

24/37 0:02 -0.08 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 Ke6 43.b5 Bg4 44.d4 c4
45.Rb4 Kd6 46.Rxc4 Rxb5 47.Rb4 Bd7
48.h4 (23.026.784) 10991

25/47 0:02 -0.08 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 h5 43.Ra1 cxb4
44.Ra6 Be6 45.d4+ Kf6 46.cxb4 Ke7
47.Ra5 Rxb4 48.Rxg5 (28.983.490) 10966

26/43 0:03 -0.08 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 h5 43.Ra1 cxb4
44.Ra6 Be6 45.d4+ Kf6 46.cxb4 Ke7
47.Ra5 Rxb4 48.Rxg5 (35.577.502) 10866

27/42 0:03 -0.15-- 35.Rc1 g5 (36.923.800) 10847

27/42 0:03 -0.16 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 Ke6 43.b5 Kd6 44.d4 Ra7
45.Kd3 Rf7 46.Ke3 Bg4 47.Kd3 Be6
48.d5 (37.708.832) 10807

28/34 0:04 -0.09++ 35.Rc1 (51.000.093) 10644

28/46 0:05 -0.19 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 Ke6 43.b5 Ra7 44.d4 Kd6
45.Rb2 c4 46.e5+ Kd5 47.b6 Rb7
48.Rb5+ (53.393.487) 10653

29/48 0:06 -0.12++ 35.Rc1 (68.805.142) 10682

29/48 0:06 -0.23 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 Ke6 43.b5 Ra7 44.d4 Kd6
45.Rb2 cxd4+ 46.Kxd4 Ra4+ 47.Rb4 Rxb4+
48.cxb4 (74.412.423) 10676

30/43 0:07 -0.16++ 35.Rc1 (82.421.949) 10673

30/46 0:08 -0.19 35.Rc1 g5 36.Kg1 Bg7 37.Kf2 Bxe5
38.fxe5 Kf7 39.Ke3 Ke6 40.Rb1 Kxe5
41.b4 Rb7 42.c3 Ke6 43.b5 Ra7 44.Rb2 Kd6
45.Rf2 Be6 46.b6 Rb7 47.Rf6 Rxb6
48.d4 (93.753.062) 10713

31/49 0:13 -0.12++ 35.Rc1 (139.848.293) 10743

31/49 0:23 -0.08 35.Rc1 Bd6 36.Nc4 Bc7 37.Kg1 Kf7
38.Kf2 Ra2 39.Ke3 Be6 40.e5 g6 41.d4 cxd4+
42.Kxd4 Ra8 43.Kc3 Rc8 44.Kd3 Rd8+
45.Kc3 Ra8 46.Kd4 Ke7 47.Kc3 Rc8
48.Kd3 (244.712.523) 10606
.
.
.
36/56 4:49 -1.11 35.Ng6 Bd6 36.Re1 Kf7 37.f5 Ra2
38.Rc1 c4 39.dxc4 Bc5 40.Nf4 Be3
41.Re1 Bxf4 42.gxf4 Rxc2 43.c5 Rxc5
44.Kg1 Bg4 45.Re3 Rb5 46.Rg3 Bd1
47.Rc3 Rb7 48.Kf2 (3.200.571.460) 11070
.
.
.
41/57 42:36 -1.62 35.Ng6 Bd6 36.Re1 Kf7 37.f5 Ra2
38.Rc1 c4 39.dxc4 h5 40.Nf4 Bg4
41.Kg2 Ba3 42.Rf1 Bc5 43.h3 Rxc2+
44.Kh1 Be2 45.Nxe2 Rxe2 46.Rd1 Rxe4
47.Rd7+ Kf6 48.Kg2 (28.571.692.284) 11175

42/53 43:11 -1.55++ 35.Ng6 (28.964.033.275) 11177

42/54 43:40 -1.57 35.Ng6 Bd6 36.Re1 Kf7 37.f5 Ra2
38.Rc1 c4 39.dxc4 h5 40.Nf4 Bg4
41.Kg2 Ba3 42.Rf1 Bc5 43.h3 Rxc2+
44.Kh1 Be2 45.Nxe2 Rxe2 46.Rd1 Rxe4
47.Rd7+ Kf6 48.Kg2 (29.291.863.573) 11179

43/58 44:39 -1.64-- 35.Ng6 Bd6 (29.972.203.243) 11186

43/64 58:05 -1.57++ 35.Nc4 (38.982.302.684) 11183

43/64 61:17 -1.45++ 35.Nc4 (41.056.108.916) 11164

43/67 65:26 -1.42 35.Nc4 g5 36.Rc1 Bg7 37.Ne5 Bxe5
38.fxe5 Kf7 39.Kg1 Ke6 40.Kf2 Kxe5
41.Ke3 Ra2 42.Kf3 Be6 43.Ke3 Rb2
44.h4 g4 45.b4 Rxb4 46.Ra1 Rb2
47.Kd2 Kd4 48.Ra8 (43.816.279.380) 11159

44/60 69:13 -1.50-- 35.Nc4 g5 (46.318.745.058) 11150

44/63 72:37 -1.57-- 35.Nc4 g5 (48.608.234.304) 11156

44/63 84:14 -1.69-- 35.Nc4 g5 (56.236.277.156) 11126

Lyudmil should appreciate that it is specifically playing Anti-Stockfish chess. Opponent modeling. If it would play against Lyudmil, in 4 hours it would not just beat him but show him where to improve his game. I think Chessbase would love to have a tool like this for sale.

I'm not sure the team is willing to pursue chess, I have not read much of the paper but I understood they are not interested in chess? After Deep Blue beat Kasparov it was no longer interesting to get stronger. And not much to learn from humans anymore...

Eelco, how can you buy into the SCAM too?
The hardware advantage was 50/1.
It plays 1850-elo chess on a single core.

Above diagram is already way way won for black; Stockfish blundered already in the opening with Nce5, this is already lost.

Alpha beating me? Gosh, I will shred it to pieces.

It understands absolutely nothing of closed positions, no such were encountered in the sample.

It is all about the hardware, 2 or 3 beautiful games, with the d4-e5-f6 chain outperforming a whole black minor piece, one great attack on the bare SF king and one more, all the rest is just exceding computations.
Nothing special about its eval.

BeyondCritics · Post by **BeyondCritics** » Thu Dec 07, 2017 2:04 pm

cdani wrote:
Will be very interesting to know which was the typical deep achieved by AlphaZero. I bet that much less than Stockfish..

You lost that bet

AlphaZero uses MCTS https://en.wikipedia.org/wiki/Monte_Carlo_tree_search. From this source https://www.arxiv-vanity.com/papers/1712.01815v1/:

Instead of an alpha-beta search with domain-specific enhancements, AlphaZero uses a general-purpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of simulated games of self-play that traverse a tree from root to leaf.

...

At the end of the game, the terminal position is scored according to the rules of the game to compute the game outcome -1 for a loss, 0 for a draw, and +1 for a win.

kranium · Post by **kranium** » Thu Dec 07, 2017 2:26 pm

Lyudmil Tsvetkov wrote:
chessmobile wrote:Seems to play a mean game of chess. The endgames is where it excels. Many games looked equal to the naked eye but Alpha went on to win. If this thing follows the Go project then expect in a few months a monster that will beat it's current version quite easily.
Again, 80% of games were already decided in the early opening.
Due to the opening book Alpha unfairly used.

What opening book?

Jhoravi · Post by **Jhoravi** » Thu Dec 07, 2017 2:44 pm

Do you all think that AlphaZero from scratch only knows the rules of chess and nothing more? How about each piece values like 1 for pawn and 3 for knight etc? I would be so amazed if it learned all the pieces values from self play.

EvgeniyZh · Post by **EvgeniyZh** » Thu Dec 07, 2017 2:45 pm

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:
clumma wrote:
Lyudmil Tsvetkov wrote:Alpha had considerable hardware advantage
That comparison is not straightforward, but this claim does not seem to be true. SF had 64 threads. I'm not up on the latest scaling behavior of the engine but that has got to be near saturation.

-Carl
From what I gleaned from hardware comparisons, the advantage is 16/1.
Why would one want to run a similar very unfair match?
Only one thing comes to mind: that the company will want to advertise its colossal breakthrough with TPUs and artificial intelligence and then sell its products.

But then, the achievement is not there.
The fact that Google has created a chess playing entity that crushes SF is notable (and fascinating).

TPUs are not for sale, and (at the moment) are applied only to Googles deep learning and research projects,
except when Google donates them to research for free.

https://techcrunch.com/2017/05/17/the-t ... cientists/
What would be the score between SF on 64 cores and SF on 1024 cores out of 100 games?
You think the bigger-hardware SF would score less than 64 points?
I guess at least 80.

So what is so new?
They applied some big hardware, that is all.
The real strength of Alpha is 2850, so around spot 97 or so among engines.
97 is not such a bad achievement, after all.
I doubt if SF on 1024 cores is going to score even 50%
Maybe after some point more cores are counter productive for stockfish.

I also doubt if it is possible to get at least 80 points against stockfish with 64 cores at 1 minute per move.
Why not?
How much would SF 16-cores vs SF single core score, that is easily reproducible.
The experts claim the TPUs lack any SMP inefficiencies.
If you give the engines 10 minutes per move then I doubt so they play almost perfect chess then I guess that you will get less than 80% and 1 minute per move for 64 cores is probably stronger than 10 minutes per move for the 1 core.
Actually, the hardware advantage + simulated opening book(very important, as SF lost most games already in the opening), was close to 50/1, so how will SF 50 cores fare against SF single core?

Certainly somewhere 95% or so.

Are you aware that the dependency of TTD vs. core number is sublinear? 64 cores to 1 is not like 4096 to 64. Moreover even if it were, the dependency of strength of play vs TTD is also sublinear, and, certainly, is practically bounded. It is actually demonstrated in paper, page 7. Did you read it?

Robert Pope · Post by **Robert Pope** » Thu Dec 07, 2017 4:46 pm

Jhoravi wrote:Do you all think that AlphaZero from scratch only knows the rules of chess and nothing more? How about each piece values like 1 for pawn and 3 for knight etc? I would be so amazed if it learned all the pieces values from self play.

Yes. That's the whole point of what makes this so incredible. The fact that they can take nothing more than the rules and definitions for game outcomes and create a superhuman player makes this something that can be applied far outside the chess domain. Except, it isn't even bothering to learn piece values. It is learning which moves are better in different positions without depending on crutches like that.

Also, learning something as simple as piece values from self play is no big deal and has been easily accomplished by other machine learning programs.

lkaufman · Post by **lkaufman** » Thu Dec 07, 2017 5:08 pm

Lyudmil Tsvetkov wrote:
lkaufman wrote:
EvgeniyZh wrote:
Milos wrote:
clumma wrote:
Milos wrote:4 hours my ass (pardon my french).
Far fewer transistors and joules were used training AlphaZero than have been used training Stockfish. You can soon rent those TPUs on Google's cloud, or apply for free access now, so stop complaining. Furthermore it's an experimental project in early days and performance is obviously not optimal, so all the 'but-but-but 30 Elo because they used SF 8 instead of SF 8.00194' sounds really dumb.

Days of alpha-beta engines have come to an abrupt end.

-Carl
Sorry, that is pretty childish rent.
Google is obviously comparing apples and oranges and again doing marketing stunt and ppl are falling for it.
Days of Alpha0 on normal hardware are years away. But keep on dreaming, no one can take that from you.

P.S. Just as a small comparison. leelazero open source project trying to replicate alpha0 in Go, took 1 month to get the same games as AG0 got in 3 hours, that with constant 1000 volunteers.
For chess it would take even more.
Training AlphaZero would take tons of time. Just like creating SF from 0. However, running it took 4 TPU, which is comparable to whats available to (rich) consumers - you can get 6-8 NVIDIA V100 which would get you similar performance.
To me this is the most informative post in the whole thread, assuming it is accurate (I know nothing about TPUs). The only reasonable comparison I can think of between the AlphaZero hardware and the Stockfish hardware is cost of equivalent machines. It doesn't matter to me how much hardware was used to reach the current level of strength for both engines, just whether the playing conditions were fair. You seem to be implying that comparable hardware to the 4 TPUs would cost no more (maybe much less?) than the sixty-four core machine used by SF. Is this correct? I'm asking to learn, not making a claim myself either way.

The other conditions were of course not "fair", but reasonable given that AlphaZero only trained for a few hours. I suppose if Stockfish used a good book, was allowed to use its time management as if the time limit were pure increment, and used the latest dev. version, the match would have been much closer, but probably (judging by the infinite win to loss ratio and the actual games) SF would have still lost. The games were amazing.

Bottom line, assuming the comparable cost claim is accurate: If Google wants to optimize the software for a few weeks and sell it, rent it, or give it away, we have a revolution in computer chess. But my guess is that they won't do this, in which case the revolution may be delayed a couple years or so.
Larry, what kind of revolution, this is 30/1 hardware advantage.
Alpha is currently at 2850 level.

Based on the estimated $60k price for equivalent hardware vs. maybe $20k for the 64 core SF machine (my guess) it would be 3 to 1, not 30. The actual hardware used by Alpha would be useless for SF, so you can't compare the hardware any other way than by price, I think. It sounds like the cost of the type of hardware needed for Alpha is expected to plummet while the cost of normal CPUs just trends slightly lower. We already see the same thing in GO. Leela (top GO engine for pc, and free like SF) plays a stone or so stronger with even a cheap GPU than without one. So while Alpha might only play at 2850 level on your laptop, it might be super-strong in a year or so on something many people could afford. But if Google doesn't release it, that won't happen.

tmokonen · Post by **tmokonen** » Thu Dec 07, 2017 6:17 pm

Lyudmil Tsvetkov wrote:Eelco, how can you buy into the SCAM too?
The hardware advantage was 50/1.
It plays 1850-elo chess on a single core.

Above diagram is already way way won for black; Stockfish blundered already in the opening with Nce5, this is already lost.

Alpha beating me? Gosh, I will shred it to pieces.
It understands absolutely nothing of closed positions, no such were encountered in the sample.

It is all about the hardware, 2 or 3 beautiful games, with the d4-e5-f6 chain outperforming a whole black minor piece, one great attack on the bare SF king and one more, all the rest is just exceding computations.
Nothing special about its eval.

Just another crappy, misinformed, false bravado post from a guy who quit his job to pursue the quixotic dream of finding the perfect old-school end point evaluation for an alpha beta searcher. He can't accept the fact that his years of painstaking effort have been rendered moot by a project that was just a "meh, let's spend a few hours and see what happens" lark by the team that already conquered Go, a much more complex game than chess.

Milos · Post by **Milos** » Thu Dec 07, 2017 6:27 pm

EvgeniyZh wrote:The info on TPUs is vague, but it's said to have ~45 TFLOPs (half precision probably). For example see here. That would mean that AlphaZero ran 180 TFLOPs system. It's believed 1080 Ti is kinda cost-optimal for DL, and you'd need 16-18 of them to match performance (you may round up to 20). That's not what you'd put at home, but many DL researchers have that amount of resources. I'd roughly approximate it around $60k for the whole thing, give or take. With next generation GPU you probably can fit the whole thing in one node.
lkaufman wrote: The other conditions were of course not "fair", but reasonable given that AlphaZero only trained for a few hours. I suppose if Stockfish used a good book, was allowed to use its time management as if the time limit were pure increment, and used the latest dev. version, the match would have been much closer, but probably (judging by the infinite win to loss ratio and the actual games) SF would have still lost. The games were amazing.

Bottom line, assuming the comparable cost claim is accurate: If Google wants to optimize the software for a few weeks and sell it, rent it, or give it away, we have a revolution in computer chess. But my guess is that they won't do this, in which case the revolution may be delayed a couple years or so.

First you have to understand what TPU is. There is enough material on that, published by no one else but Google.
https://arxiv.org/abs/1704.04760
Second it is not 45 TFLOPs but 92 TOPS and that is first generation TPU. They don't say explicitly in the paper which generation TPU they used for inference (they say it just for training) but logic kind of tells us second generation is more probable.
Second generation TPUs performance is 180 TOPS.
It is int8 multiplication not single or double floating point precision operations you are used to from common GPUs and NVIDIA in general and it is certainly not tensor FLOPS (stupid marketing term by NVIDIA that has zero meaning in reality).
V100 has 15 TFLOPS single precision, that is the most you can get if you use single precision floating point as a replacement for integer multiplication. So you would need 6 V100 for one first generation TPU, and 12 for second generation one.
Alpha0 used 4 TPUs for running games, so at best 24 V100, at worst 48 V100.
V100 will at best cost 10k$, or 250k$ of half a million bucks just to run alpha0, and you think there would be chess enthusiasts to afford it???

And give me a break with theoretical GP102 performance (1080Ti). I work with them for ML and that is pure BS, so much BS that NVIDIA actually never published the figure, but instead what ppl compute as num_cores x frequency x 2 which is totally detached from reality.
In reality if you run int multiplications on it you'd see the performance is not even 1 TOPS (for int multiplication).
You think NVIDIA is so stupid to sell V100 for >10k$ offering almost the same performance as 1080Ti that costs 600$???

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo