Historic Milestone: AlphaZero

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 2:08 pm

IanO wrote:Wonderful result! Have they publicized the shogi results or published those match games anywhere? Unlike the chess and go results, that appeared to be a clear advance over the state-of-the-art.

I see there is lots of bickering about the fairness of the Stockfish match. I look forward to AlphaZero's participation in the World Computer Chess Championship, where all participants can use as much preparation and beefy hardware as they want! That appears to be the appropriate forum for users of custom hardware to strut their stuff. Heck, DeepMind would be superstars at the attached computer games conference!

Who cares, when it is just 2850 on single core?

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 2:10 pm

corres wrote:
maac wrote:
Take note that SF was at 64 cores!

Another VERY IMPORTANT notes:
1, SF used only 1 GB (!!) hash table
2, Alpha Zero did not start from zero knowledge about chess
because it was feeded a lot of human games at start up.
This is the explanation why Alpha Zero plays openings so human like.
I think it would be more correct if Stockfish would get 64 GB hash
and a good human opening book like Fritz Power Book.

Indeed, 80% of the games were decided in the early opening.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 2:12 pm

duncan wrote:
maac wrote:https://arxiv.org/pdf/1712.01815.pdf

To say that i'm impressed is a understatement. Shocking.
Take note that SF was at 64 cores!
25 wins for white and only 3 wins for black. is such a big difference what you would expect ?

When I have been claiming, the stronger the engines, the more they score with white, everyone has been laughing.
At some point, there will be only 11111111111111111s.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 2:17 pm

MikeGL wrote:
Lyudmil Tsvetkov wrote:
MikeGL wrote:
kranium wrote:
Lyudmil Tsvetkov wrote:It is not at all clear to me where were books used and where not.
I'm sure opening books were not used...
In the early self-play games things like 1.a3, 1.a4, etc. were probably tried by AlphaZero...
eventually it learned that 1. e4 or 1. d4 had the highest success rates.
Books or no books, I think AlphaZero would still demolish SF8.
Just look at this game 9, it was a decent French Defence by SF8, but it was dismantled with
amazing tactical and strategic shots by AlphaZero which seems to be beyond the reach of alpha-beta engines.

[pgn]
[Event "?"]
[Site "?"]
[Date "2017.12.06"]
[Round "9"]
[White "AlphaZero"]
[Black "Stockfish"]
[Result "1-0"]
[TimeControl "40/1260:300"]
[Termination "normal"]
[PlyCount "103"]
[WhiteType "human"]
[BlackType "human"]

1. d4 e6 2. e4 d5 3. Nc3 Nf6 4. e5 Nfd7 5. f4 c5 6. Nf3 cxd4 7. Nb5 Bb4+ 8.
Bd2 Bc5 9. b4 Be7 10. Nbxd4 Nc6 11. c3 a5 12. b5 Nxd4 13. cxd4 Nb6 14. a4
Nc4 15. Bd3 Nxd2 16. Kxd2 Bd7 17. Ke3 b6 18. g4 h5 19. Qg1 hxg4 20. Qxg4
Bf8 21. h4 Qe7 22. Rhc1 g6 23. Rc2 Kd8 24. Rac1 Qe8 25. Rc7 Rc8 26. Rxc8+
Bxc8 27. Rc6 Bb7 28. Rc2 Kd7 29. Ng5 Be7 30. Bxg6 Bxg5 31. Qxg5 fxg6 32. f5
Rg8 33. Qh6 Qf7 34. f6 Kd8 35. Kd2 Kd7 36. Rc1 Kd8 37. Qe3 Qf8 38. Qc3 Qb4
39. Qxb4 axb4 40. Rg1 b3 41. Kc3 Bc8 42. Kxb3 Bd7 43. Kb4 Be8 44. Ra1 Kc7
45. a5 Bd7 46. axb6+ Kxb6 47. Ra6+ Kb7 48. Kc5 Rd8 49. Ra2 Rc8+ 50. Kd6 Be8
51. Ke7 g5 52. hxg5 1-0
[/pgn]

not sure if 18.g4!, 30.Bxg6! and other would be found by current engines.

[d]r2qk2r/3bbppp/1p2p3/pP1pP3/P2P1P2/3BKN2/6PP/R2Q3R w kq - 0 18
After 17...b6 of black, can some engine consider 18.g4! in this position?

xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[d]4q2r/1b1kbp2/1p2p1p1/pP1pP1N1/P2P1PQP/3BK3/2R5/8 w - - 6 30
After 29...Be7, can current engines consider 30.Bxg6! here?

Would be nice if we can try to feed some difficult epd positions into AlphaZero,
to estimate its ELO strength.
I am considering g4 and Bg6, especially g4 was my first choice in under a second.
People don't believe frequently advanced long chains are stronger than a pice, but see what happens here...

I guess the main fault is thay are testing at 1 minute. Long chains have failed in SF at least 5 times or so, and still they are an extremely valid concept.

Books are important, as I have been correctly claiming for a very long time, the French is dangerous or even lost, but people have hard ears.
The game was lost much earlier, already in the opening.
LOL there goes your rediculous claims again. You haven't claimed French is lost. Lets be
honest please, you only claimed French WINAWER is lost, nothing else.
You claimed all other variations of French are playable but not the WINAWER line.

Winawer discussion was a thread on its own, I remember clearly.

Come on, we had to choose a more precise variation, in order to test it in a game.
The structure of the Advanced and the Winaver/McCutcheon/Classical, etc., is the same, white has a central e5 pawn, which e7-e6 has allowed, and a lot more space.

Anyway, those are just weak engines to assess anything. Perfect play is at 5000-6000 elos at least, and that Alpha will never progress much.

MikeGL · Post by **MikeGL** » Thu Dec 07, 2017 2:18 pm

pferd wrote:
MikeGL wrote: [d]4q2r/1b1kbp2/1p2p1p1/pP1pP1N1/P2P1PQP/3BK3/2R5/8 w - - 6 30
After 29...Be7, can current engines consider 30.Bxg6! here?

Would be nice if we can try to feed some difficult epd positions into AlphaZero,
to estimate its ELO strength.
A slightliy modified version of Stockfish gets the second position right but it needed more than 13 mins (depth 47) with 6 Threads on my machine to bring this move to the top.

The final output:
Code: Select all
info depth 49 seldepth 82 multipv 1 score cp 121 nodes 15423323813 nps 9884116 hashfull 999 tbhits 8278452 time 1560415 pv d3g6 e7g5 8 g2g6 f8h8 h4h5 d8e8 c1d2 c8d7 h5h6 h8h7 g6g8 e8f7 g8b8 h7h6 b8b6 h6h2 d2d3 h2h3 d3e2 d7e8 b6a6 h3h2 e2f3 h2h3 f3f2 h3h2 f2g3 h2d2 a
bestmove d3g6 ponder e7g5
I also tried the first position but Stockfish did not come up with g4 after 1/2 hour.

g2g6 f8h8 in your above line looks incorrect, an illegal move.

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Dec 07, 2017 2:19 pm

pferd wrote:
MikeGL wrote: [d]4q2r/1b1kbp2/1p2p1p1/pP1pP1N1/P2P1PQP/3BK3/2R5/8 w - - 6 30
After 29...Be7, can current engines consider 30.Bxg6! here?

Would be nice if we can try to feed some difficult epd positions into AlphaZero,
to estimate its ELO strength.
A slightliy modified version of Stockfish gets the second position right but it needed more than 13 mins (depth 47) with 6 Threads on my machine to bring this move to the top.

The final output:
Code: Select all
info depth 49 seldepth 82 multipv 1 score cp 121 nodes 15423323813 nps 9884116 hashfull 999 tbhits 8278452 time 1560415 pv d3g6 e7g5 8 g2g6 f8h8 h4h5 d8e8 c1d2 c8d7 h5h6 h8h7 g6g8 e8f7 g8b8 h7h6 b8b6 h6h2 d2d3 h2h3 d3e2 d7e8 b6a6 h3h2 e2f3 h2h3 f3f2 h3h2 f2g3 h2d2 a
bestmove d3g6 ponder e7g5
I also tried the first position but Stockfish did not come up with g4 after 1/2 hour.

Because the middlegame is much more complex.

MikeGL · Post by **MikeGL** » Thu Dec 07, 2017 2:28 pm

Lyudmil Tsvetkov wrote:
KWRegan wrote:
Lyudmil Tsvetkov wrote:Why don't they disclose what their evaluation is: that will be a big step towards knowing the truth.
They can't. The evaluation is a sequence of numbers specifying myriad weights on umpteen-dozen layers of a neural network. This aspect (of the original AlphaGo) in contrast to Stockfish is addressed in my Feb. 2016 article https://rjlipton.wordpress.com/2016/02/07/magic-to-do/ That this is endemic to "deep learning" has energized a counter-push toward "Explainable AI."

What I wish to know better, incidentally, is the memory footprint of their trained network and how portable it is.
They are still tuning at the level of a 2850 single core engine, so things will just get significantly more difficult in the future, when the quality of the terms will have much higher impact.

Mentioned in the paper, the eval is non-linear, not like the current engines that
uses linear eval functions. They are not tuning the eval, the AI itself is tuning the eval
autonomously without human input.

pferd · Post by **pferd** » Thu Dec 07, 2017 2:32 pm

MikeGL wrote:
pferd wrote:
MikeGL wrote: [d]4q2r/1b1kbp2/1p2p1p1/pP1pP1N1/P2P1PQP/3BK3/2R5/8 w - - 6 30
After 29...Be7, can current engines consider 30.Bxg6! here?

Would be nice if we can try to feed some difficult epd positions into AlphaZero,
to estimate its ELO strength.
A slightliy modified version of Stockfish gets the second position right but it needed more than 13 mins (depth 47) with 6 Threads on my machine to bring this move to the top.

The final output:
Code: Select all
info depth 49 seldepth 82 multipv 1 score cp 121 nodes 15423323813 nps 9884116 hashfull 999 tbhits 8278452 time 1560415 pv d3g6 e7g5 8 g2g6 f8h8 h4h5 d8e8 c1d2 c8d7 h5h6 h8h7 g6g8 e8f7 g8b8 h7h6 b8b6 h6h2 d2d3 h2h3 d3e2 d7e8 b6a6 h3h2 e2f3 h2h3 f3f2 h3h2 f2g3 h2d2 a
bestmove d3g6 ponder e7g5
I also tried the first position but Stockfish did not come up with g4 after 1/2 hour.
g2g6 f8h8 in your above line looks incorrect, an illegal move.

Nice catch! Something went terribly wrong when copying the output from the console window.

This is the correct output:

Code: Select all

info depth 49 seldepth 82 multipv 1 score cp 121 nodes 15423323813 nps 9884116 hashfull 999 tbhits 8278452 time 1560415 pv d3g6 e7g5 g4g5 f7g6 f4f5 h8g8 g5h6 e8f7 f5f6 d7d8 e3d2 b7c8 h6e3 c8b7 d2c1 f7h7 e3a3 h7h6 c1d1 h6f8 a3c3 f8f7 d1c1 g8h8 c3a3 f7f8 a3f8 h8f8 c2g2 b7c8 g2g6 f8h8 h4h5 d8e8 c1d2 c8d7 h5h6 h8h7 g6g8 e8f7 g8b8 h7h6 b8b6 h6h2 d2d3 h2h3 d3e2 d7e8 b6a6 h3h2 e2f3 h2h3 f3f2 h3h2 f2g3 h2d2 a6a5 d2d4 a5a7 f7f8 b5b6 d4b4 b6b7 d5d4 g3f4 b4b2 f6f7 e8a4 a7a8 f8f7 b7b8q b2b8 a8b8 a4c6 b8b4 d4d3 f4e3 f7e7 e3d3

MikeGL · Post by **MikeGL** » Thu Dec 07, 2017 2:35 pm

pferd wrote:
MikeGL wrote:
pferd wrote:
MikeGL wrote: [d]4q2r/1b1kbp2/1p2p1p1/pP1pP1N1/P2P1PQP/3BK3/2R5/8 w - - 6 30
After 29...Be7, can current engines consider 30.Bxg6! here?

Would be nice if we can try to feed some difficult epd positions into AlphaZero,
to estimate its ELO strength.
A slightliy modified version of Stockfish gets the second position right but it needed more than 13 mins (depth 47) with 6 Threads on my machine to bring this move to the top.

The final output:
Code: Select all
info depth 49 seldepth 82 multipv 1 score cp 121 nodes 15423323813 nps 9884116 hashfull 999 tbhits 8278452 time 1560415 pv d3g6 e7g5 8 g2g6 f8h8 h4h5 d8e8 c1d2 c8d7 h5h6 h8h7 g6g8 e8f7 g8b8 h7h6 b8b6 h6h2 d2d3 h2h3 d3e2 d7e8 b6a6 h3h2 e2f3 h2h3 f3f2 h3h2 f2g3 h2d2 a
bestmove d3g6 ponder e7g5
I also tried the first position but Stockfish did not come up with g4 after 1/2 hour.
g2g6 f8h8 in your above line looks incorrect, an illegal move.
Nice catch! Something went terribly wrong when copying the output from the console window.

This is the correct output:
Code: Select all
info depth 49 seldepth 82 multipv 1 score cp 121 nodes 15423323813 nps 9884116 hashfull 999 tbhits 8278452 time 1560415 pv d3g6 e7g5 g4g5 f7g6 f4f5 h8g8 g5h6 e8f7 f5f6 d7d8 e3d2 b7c8 h6e3 c8b7 d2c1 f7h7 e3a3 h7h6 c1d1 h6f8 a3c3 f8f7 d1c1 g8h8 c3a3 f7f8 a3f8 h8f8 c2g2 b7c8 g2g6 f8h8 h4h5 d8e8 c1d2 c8d7 h5h6 h8h7 g6g8 e8f7 g8b8 h7h6 b8b6 h6h2 d2d3 h2h3 d3e2 d7e8 b6a6 h3h2 e2f3 h2h3 f3f2 h3h2 f2g3 h2d2 a6a5 d2d4 a5a7 f7f8 b5b6 d4b4 b6b7 d5d4 g3f4 b4b2 f6f7 e8a4 a7a8 f8f7 b7b8q b2b8 a8b8 a4c6 b8b4 d4d3 f4e3 f7e7 e3d3

Thanks for this line. Looks like your SF modded version finds the winning line played by AlphaZero correctly.

pferd · Post by **pferd** » Thu Dec 07, 2017 2:45 pm

MikeGL wrote:
Thanks for this line. Looks like your SF modded version finds the winning line played by AlphaZero correctly.

There is nothing special to this version. It contains just 2 small changes:
1) I removed the lazy eval.
2) I applied pull request #1289

I will give stockfish-master a try and see how things work out...

Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: Historic Milestone: AlphaZero

Re: MCTS-NN vs alpha-beta

Re: MCTS-NN vs alpha-beta

Re: MCTS-NN vs alpha-beta

Re: Much weaker than Stockfish

Re: MCTS-NN vs alpha-beta

Re: MCTS-NN vs alpha-beta

Re: MCTS-NN vs alpha-beta