Historic Milestone: AlphaZero

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 3:18 pm

Amidst all the hullabaloo, no one is noticing the first moves both engines have been playing.

Alpha chooses only 1.d4 and 1.Nf3, while Stockfish goes for 1.e4

Judging from this, I can say that Alpha is much weaker than SF in terms of software, and the only reason for the win is the very big hardware advantage.

I observed nothing special in the way Alpha plays in these 10 games, it is going for the same open positions as most engines, just that outcalculates Stockfish consistently.
This could be due only to substantial hardware advantage.

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 3:20 pm

peter wrote:Many thanks to you too, Gary!
gladius wrote:First example game, Stockfish-Alphazero 0-1
[pgn]
1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. d3 Bc5 5. Bxc6 dxc6 6. O-O Nd7 7. Nbd2 O-O 8. Qe1 f6 9. Nc4 Rf7
10. a4 Bf8 11. Kh1 Nc5 12. a5 Ne6 13. Ncxe5 fxe5 14. Nxe5 Rf6 15. Ng4 Rf7 16. Ne5 Re7 17. a6 c5
18. f4 Qe8 19. axb7 Bxb7 20. Qa5 Nd4 21. Qc3 Re6 22. Be3 Rb6 23. Nc4 Rb4 24. b3 a5 25. Rxa5
Rxa5 26. Nxa5 Ba6 27. Bxd4 Rxd4 28. Nc4 Rd8 29. g3 h6 30. Qa5 Bc8 31. Qxc7 Bh3 32. Rg1 Rd7
33. Qe5 Qxe5 34. Nxe5 Ra7 35. Nc4 g5 36. Rc1 Bg7 37. Ne5 Ra8 38. Nf3 Bb2 39. Rb1 Bc3 40. Ng1
Bd7 41. Ne2 Bd2 42. Rd1 Be3 43. Kg2 Bg4 44. Re1 Bd2 45. Rf1 Ra2 46. h3 Bxe2 47. Rf2 Bxf4 48.
Rxe2 Be5 49. Rf2 Kg7 50. g4 Bd4 51. Re2 Kf6 52. e5+ Bxe5 53. Kf3 Ra1 54. Rf2 Re1 55. Kg2+ Bf4
56. c3 Rc1 57. d4 Rxc3 58. dxc5 Rxc5 59. b4 Rc3 60. h4 Ke5 61. hxg5 hxg5 62. Re2+ Kf6 63. Kf2
Be5 64. Ra2 Rc4 65. Ra6+ Ke7 66. Ra5 Ke6 67. Ra6+ Bd6 0-1
[/pgn]
As far as I read the paper, I didn't see a book for Stockfish mentioned, so I guess, match was bookless.

Considering that, opening isn't that bad (as described in document, AlphaZero anyhow has "reinvented" most opening moves often played by humans already ).
But 8.Qe1 from SF seems a llttle exotic anyway, doesn't it

Judging from AlphaZero won games with black, it seems to show that 3...a6 in Ruy Lopez is not as strong as
the direct piece development with 3...Nf6

Dan Cooper · Post by **Dan Cooper** » Wed Dec 06, 2017 3:34 pm

Jouni wrote:Sounds like april fool. But isn't !. I just read, that they are not interested in chess? But I read somewhere: So basically they played 400 cores engine vs 60 cores engine. Hmm..

Here come the excuses.

MikeGL · Post by **MikeGL** » Wed Dec 06, 2017 3:36 pm

Lyudmil Tsvetkov wrote:Amidst all the hullabaloo, no one is noticing the first moves both engines have been playing.

Alpha chooses only 1.d4 and 1.Nf3, while Stockfish goes for 1.e4

Judging from this, I can say that Alpha is much weaker than SF in terms of software, and the only reason for the win is the very big hardware advantage.

I observed nothing special in the way Alpha plays in these 10 games, it is going for the same open positions as most engines, just that outcalculates Stockfish consistently.
This could be due only to substantial hardware advantage.

Or maybe since AlphaZero don't use alpha-beta search, it's Monte-Carlo Tree Search (MTCS) algorithm could really be more superior. But your point is valid, both should be tested with equal hardware.

See that game where SF8 was forced to trap its Q at h8, looks like a zugzwang puzzle which is beyond SF8 prowess.

Milos · Post by **Milos** » Wed Dec 06, 2017 3:50 pm

smatovic wrote:
Since no one of us has access to TPU it's only fair to count in terms of what is available (for example if we had Alpha0 x64 binary compile and wanted to run it at home).
1TPU ~ 30xE5-2699v3 (18 cores machine).
4TPUs ~ 2000 Haswell cores
Apples and Bananas,

Stockfish is not able to make use of these TPUs,
and AlphaZero depends probably heavily on floating point operations (maybe half precision) to query the neural network.

So the question might be, if an stripped down x86-64 version of AlphaZero, with only some hundred or thousand of nps, is still able to beat Stockfish.....dunno.

--
Srdja

No original paper is comparing apples and bananas. SF is running on general purpose hardware. TPUs are not commercially available so running alpha0 on TPUs is giving it huge unfair advantage.
It would be like running SF on special hardware where search is happening on conventional CPU and all evaluation is handled with hundreds if not thousands of FPGAa, something like DeepBlue. Then we could say that comparison is fair.
Even in this setup, if it was the most recent version of Brainfish (so with opening book), and normal TC like 40/40 not 1move/min, Alpha0 would probably loose.

tralala · Post by **tralala** » Wed Dec 06, 2017 3:54 pm

Wow! I was wondering about an AlphaZero-Chess version and here it is.
It looks like their MTCS-NN combo is really powerful for a lot of games.

As others have mentioned the test conditions are far from ideal for CPU-Chess-Programs:

- the fixed 1minute/move takes away probably 20-30 Elo time-management-benefits
- 1GB of Hash for 64 Threads is (very) suboptimal!
- 64 Threads could be detrimental to playing strength if not tuned carefully

Other things are a bit unclear:

- which hardware was used for Stockfish (optimized binary?)
- which opening book was used

but still - impressive!

I would love to see an open challenge for the Stockfishteam (or Houdini perhaps?) when they play with classical time control und the CPU-Team can prepare their hardware, book choice and settings independently.

jdart · Post by **jdart** » Wed Dec 06, 2017 4:00 pm

There is a chart indicating that the relative advantaqe of AlphaGo actually goes up with thinking time, but it does need to be extended to long time controls.

Note also that, although Stockfish hardware was not optimal, AlphaGo was playing "4 TPUs," whatever that is, but I don't think that is big hardware.

So, not the best test, but still, neural network trained programs and programs trained with reinforcement learning have not so far been near Stockfish level.

The other thing notable here though is that training was an extremely compute-intensive task.

--Jon

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 4:05 pm

Steve Maughan wrote:I'm gobsmacked.

There's a bit of me that's skeptical, but since they've cracked Go I guess it's legitimate.

When will the UCI version be available? I need to play with it.

It is much weaker than SF.
And Go is much much simpler than chess what concerns evaluation. Maybe around 1000 times simpler. Go was perfect for tuning just a handful of terms, chess is a quite a different story.

Chess will not be that easy to crack.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Wed Dec 06, 2017 4:08 pm

jdart wrote:There is a chart indicating that the relative advantaqe of AlphaGo actually goes up with thinking time, but it does need to be extended to long time controls.

Note also that, although Stockfish hardware was not optimal, AlphaGo was playing "4 TPUs," whatever that is, but I don't think that is big hardware.

So, not the best test, but still, neural network trained programs and programs trained with reinforcement learning have not so far been near Stockfish level.

The other thing notable here though is that training was an extremely compute-intensive task.

--Jon

So no one actually knows how many standard cores 1/4 TPUs are?

jdart · Post by **jdart** » Wed Dec 06, 2017 4:16 pm

I don't think the exact equivalency is known, but apparently (from the thread in the Programming section), each TPU is a hefty custom processing unit. So AlphaGo probably did not have a hardware handicap here.

--Jon

Historic Milestone: AlphaZero

Much weaker than Stockfish

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Much weaker than Stockfish

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero