AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

clumma
Posts: 186
Joined: Fri Oct 10, 2014 10:05 pm
Location: Berkeley, CA

AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by clumma »

A truly stunning result. Matthew Lai is a coauthor!

https://arxiv.org/pdf/1712.01815.pdf

-Carl
MikeGL
Posts: 1010
Joined: Thu Sep 01, 2011 2:49 pm

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by MikeGL »

clumma wrote:A truly stunning result. Matthew Lai is a coauthor!

https://arxiv.org/pdf/1712.01815.pdf

-Carl
Thanks for the PDF file. I watched two games, included in your PDF, where AlphaZero won as black. Quite impressive game actually.

[pgn]
[Event "?"]
[Site "?"]
[Date "?"]
[Round "?"]
[White "SF8"]
[Black "AlphaZero"]
[Result "0-1"]
[TimeControl "1min per move"]
[Termination "unterminated"]
[PlyCount "174"]
[WhiteType "Engine"]
[BlackType "Engine"]
1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. d3 Bc5 5. Bxc6 dxc6 6. O-O Nd7 7. Nbd2
O-O 8. Qe1 f6 9. Nc4 Rf7 10. a4 Bf8 11. Kh1 Nc5 12. a5 Ne6 13. Ncxe5 fxe5
14. Nxe5 Rf6 15. Ng4 Rf7 16. Ne5 Re7 17. a6 c5 18. f4 Qe8 19. axb7 Bxb7 20.
Qa5 Nd4 21. Qc3 Re6 22. Be3 Rb6 23. Nc4 Rb4 24. b3 a5 25. Rxa5 Rxa5 26.
Nxa5 Ba6 27. Bxd4 Rxd4 28. Nc4 Rd8 29. g3 h6 30. Qa5 Bc8 31. Qxc7 Bh3 32.
Rg1 Rd7 33. Qe5 Qxe5 34. Nxe5 Ra7 35. Nc4 g5 36. Rc1 Bg7 37. Ne5 Ra8 38.
Nf3 Bb2 39. Rb1 Bc3 40. Ng1 Bd7 41. Ne2 Bd2 42. Rd1 Be3 43. Kg2 Bg4 44. Re1
Bd2 45. Rf1 Ra2 46. h3 Bxe2 47. Rf2 Bxf4 48. Rxe2 Be5 49. Rf2 Kg7 50. g4
Bd4 51. Re2 Kf6 52. e5+ Bxe5 53. Kf3 Ra1 54. Rf2 Re1 55. Kg2+ Bf4 56. c3
Rc1 57. d4 Rxc3 58. dxc5 Rxc5 59. b4 Rc3 60. h4 Ke5 61. hxg5 hxg5 62. Re2+
Kf6 63. Kf2 Be5 64. Ra2 Rc4 65. Ra6+ Ke7 66. Ra5 Ke6 67. Ra6+ Bd6 0-1
[/pgn]

[pgn]
[Event "?"]
[Site "?"]
[Date "?"]
[Round "?"]
[White "SF8"]
[Black "AlphaZero"]
[Result "0-1"]
[TimeControl "1min per move"]
[Termination "unterminated"]
[PlyCount "174"]
[WhiteType "Engine"]
[BlackType "Engine"]
1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. d3 Bc5 5. Bxc6 dxc6 6. O-O Nd7 7. c3 O-O
8. d4 Bd6 9. Bg5 Qe8 10. Re1 f6 11. Bh4 Qf7 12. Nbd2 a5 13. Bg3 Re8 14. Qc2
Nf8 15. c4 c5 16. d5 b6 17. Nh4 g6 18. Nhf3 Bd7 19. Rad1 Re7 20. h3 Qg7 21.
Qc3 Rae8 22. a3 h6 23. Bh4 Rf7 24. Bg3 Rfe7 25. Bh4 Rf7 26. Bg3 a4 27. Kh1
Rfe7 28. Bh4 Rf7 29. Bg3 Rfe7 30. Bh4 g5 31. Bg3 Ng6 32. Nf1 Rf7 33. Ne3
Ne7 34. Qd3 h5 35. h4 Nc8 36. Re2 g4 37. Nd2 Qh7 38. Kg1 Bf8 39. Nb1 Nd6
40. Nc3 Bh6 41. Rf1 Ra8 42. Kh2 Kf8 43. Kg1 Qg6 44. f4 gxf3 45. Rxf3 Bxe3+
46. Rfxe3 Ke7 47. Be1 Qh7 48. Rg3 Rg7 49. Rxg7+ Qxg7 50. Re3 Rg8 51. Rg3
Qh8 52. Nb1 Rxg3 53. Bxg3 Qh6 54. Nd2 Bg4 55. Kh2 Kd7 56. b3 axb3 57. Nxb3
Qg6 58. Nd2 Bd1 59. Nf3 Ba4 60. Nd2 Ke7 61. Bf2 Qg4 62. Qf3 Bd1 63. Qxg4
Bxg4 64. a4 Nb7 65. Nb1 Na5 66. Be3 Nxc4 67. Bc1 Bd7 68. Nc3 c6 69. Kg1
cxd5 70. exd5 Bf5 71. Kf2 Nd6 72. Be3 Ne4+ 73. Nxe4 Bxe4 74. a5 bxa5 75.
Bxc5+ Kd7 76. d6 Bf5 77. Ba3 Kc6 78. Ke1 Kd5 79. Kd2 Ke4 80. Bb2 Kf4 81.
Bc1 Kg3 82. Ke2 a4 83. Kf1 Kxh4 84. Kf2 Kg4 85. Ba3 Bd7 86. Bc1 Kf5 87. Ke3 Ke6 0-1
[/pgn]

Some moves made by AlphaZero is difficult for SF8 to find.
Would analyze all the other games once I have extra time later.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by Lyudmil Tsvetkov »

- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by Lyudmil Tsvetkov »

With what is this different from a self-tuning software, as widely used in autotuning engines, applied on a very large scale/involving tremendous hardware?
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by Daniel Shawul »

What is different is that alphazero's evaluation selects features of eval by itself (via a nerual network), while in the standard approach the programmer select features (e.g. passsed pawns, king safety, rook on open file etc) and just tunes the weights. The downside of the neural-network approach is that you may not understand why it does what it does.

Daniel
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by kranium »

Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by kranium »

Code: Select all

Program Chess Shogi Go
AlphaZero 80k 40k 16k
Stockfish 70,000k
Elmo 35,000k
Table S4: Evaluation speed (positions/second) of AlphaZero, Stockfish, and Elmo in chess,
shogi and Go.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by Milos »

kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.
kranium
Posts: 2129
Joined: Thu May 29, 2008 10:43 am

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by kranium »

Milos wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:- Alpha had considerable hardware advantage
- SF played with version 8
- what was the code/software/evaluation base used for the first Alpha chess version, an advanced engine evaluation and search software or otherwise?
As Daniel explains: no hard coded evaluation (software)...it's game play is based on learning (experience) from previous self-play games applied to a neural network

5,000 first-generation TPUs to generate self-play games
and 64 second-generation TPUs to train the neural networks

The hardware advantage is not such an important factor during gameplay as one would imagine.
It actually is, instead of 4TPUs required to run Alpha0 so far, on x64 hardware one would need around 2000 Haswell cores to achieve the same speed of NN (80k patterns evaluated per second). Since NNs are huge, with smaller resources matrix multiplication would have to be broken into smaller sub-matrices which would exponentially slow down the calculation.

AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: AlphaZero beats AlphaGo Zero, Stockfish, and Elmo

Post by Milos »

kranium wrote:AlphaZero very selectively evaluating 80k vs Stockfish's 70,000k positions/sec, probably achieving tremendous depths at such speeds,
but I'd guess it's the deep (learned) positional eval which is primarily adding strength...
Alpha0 iz basically behaving like huge highly selective opening book.
However, beside hardware other stuff are highly questionable in this work.
I guess ppl are a bit intimidated to ask question because it is Google, but many things are fishy and unfavourable to SF.
One big disadvantage was TC, 1min/move means SF spent only 1 minute for each of the opening moves while in normal TC like 40/40 it would spend easily 5-10 minutes per each of opening moves. That made it much weaker 20 maybe even 30Elo since most of loses for SF already happen in the opening.
Second is no-book play, where Alpha0 mainly forces openings and lines that it spent most of the time training and SF had no help from book whatsever, so in this case to make it at least a bit more fair one should use strong book such as Cerebellum as a support to SF.
Starting from 12 typical human openings (only 4 moves deep at max), the gap Alpha0 had over SF reduced from 100 to 77Elo which can be seen from the paper.
Third even though they used last year TCEC winner, SF8 has untested behaviour on 64 cores, and on that hardware is at least 30 if not more Elo weaker than the current SFdev.
So taking all into consideration it is pretty safe to assume that latest Brainfish at normal TC like 40/40 would be at list on par if not stronger than Alpha0. And all that on much weaker hardware.
If they really wanted to make fair comparison instead of running Alpha0 on regular x64 one could also run SF on custom hardware where all the evaluation is handled with fully custom implemented FPGAs (like DeepBlue did) and then one would see how much weaker Alpha0 really is, when comparison is not apples and oranges.
Last edited by Milos on Wed Dec 06, 2017 4:36 pm, edited 1 time in total.