AlphaZero is not like other chess programs

Milos · Post by **Milos** » Sat Dec 09, 2017 11:19 pm

Rein Halbersma wrote:The deep neural network connects the pieces on different squares to each other. They use 3x3 convolutions. This means that the next 8x8 layer's cells are connected to a 3x3 region (called "receptive field") in the previous region, and to a 5x5 region in the layer before etc. After only 4 layers, each cell is connected to every other cell in the original input layer. For AlphaGoZero they used no less than 80 layers. Then they also have many "feature maps" in parallel, so that they can learn different concepts related to piece-square combinations. Finally, they use the last 8 positions as input as well, so they also have a sense of ongoing maneuvers. All this is then being trained on the game result and the best move from the MC tree search.

Although the amount of resources required to train the millions of weights related to these neural networks is enormous, conceptually it is not surprising that pawn structure, king safety, mobility and even deep tactics can be detected from the last 8 positions.

This is one of the best summaries from the AGZ paper assuming the same DCNN is used for chess. However, there are no indications that DCNN for chess is organized in the same way as for Go, since there is no mentioning about this in the paper. I guess they left it for the next Nature publication.
We know how are input features organized, we know the policies, but that really doesn't tell much about actual network implementation especially since both inputs and policies are totally different and much more complex in case of chess compared to Go.
The only thing we can guess from the paper is the total number/size of weights of NN.
We have 80k searches per second (so not only evaluations but complete MCT searches that perform one leaf eval per search on 4 BTUs (taking it from 8 leaf nodes deep queue), and there is also no mention of the hardware running actual MCTS but that is most probably general purpose CPU not weaker than the one that was running SF).
Since BTUs eval speed is the same as in training they were most probably the same, i.e. first generation ones performing 92 T int8 FMAs.
So, 4*92T/80k = 4.6GB of weights.

CheckersGuy · Post by **CheckersGuy** » Sat Dec 09, 2017 11:24 pm

Milos wrote:
Rein Halbersma wrote:The deep neural network connects the pieces on different squares to each other. They use 3x3 convolutions. This means that the next 8x8 layer's cells are connected to a 3x3 region (called "receptive field") in the previous region, and to a 5x5 region in the layer before etc. After only 4 layers, each cell is connected to every other cell in the original input layer. For AlphaGoZero they used no less than 80 layers. Then they also have many "feature maps" in parallel, so that they can learn different concepts related to piece-square combinations. Finally, they use the last 8 positions as input as well, so they also have a sense of ongoing maneuvers. All this is then being trained on the game result and the best move from the MC tree search.

Although the amount of resources required to train the millions of weights related to these neural networks is enormous, conceptually it is not surprising that pawn structure, king safety, mobility and even deep tactics can be detected from the last 8 positions.
This is one of the best summaries from the AGZ paper assuming the same DCNN is used for chess. However, there are no indications that DCNN for chess is organized in the same way as for Go, since there is no mentioning about this in the paper. I guess they left it for the next Nature publication.
We know how are input features organized, we know the policies, but that really doesn't tell much about actual network implementation especially since both inputs and policies are totally different and much more complex in case of chess compared to Go.
The only thing we can guess from the paper is the total number/size of weights of NN.
We have 80k searches per second (so not only evaluations but complete MCT searches that perform one leaf eval per search on 4 BTUs (taking it from 8 leaf nodes deep queue), and there is also no mention of the hardware running actual MCTS but that is most probably general purpose CPU not weaker than the one that was running SF).
Since BTUs eval speed is the same as in training they were most probably the same, i.e. first generation ones performing 92 T int8 FMAs.
So, 4*92T/80k = 4.6GB of weights.

Did the paper actually say 80k iterations per second ? Wasn't it 80k nodes per second ?

Milos · Post by **Milos** » Sat Dec 09, 2017 11:31 pm

CheckersGuy wrote:
Milos wrote:
Rein Halbersma wrote:The deep neural network connects the pieces on different squares to each other. They use 3x3 convolutions. This means that the next 8x8 layer's cells are connected to a 3x3 region (called "receptive field") in the previous region, and to a 5x5 region in the layer before etc. After only 4 layers, each cell is connected to every other cell in the original input layer. For AlphaGoZero they used no less than 80 layers. Then they also have many "feature maps" in parallel, so that they can learn different concepts related to piece-square combinations. Finally, they use the last 8 positions as input as well, so they also have a sense of ongoing maneuvers. All this is then being trained on the game result and the best move from the MC tree search.

Although the amount of resources required to train the millions of weights related to these neural networks is enormous, conceptually it is not surprising that pawn structure, king safety, mobility and even deep tactics can be detected from the last 8 positions.
This is one of the best summaries from the AGZ paper assuming the same DCNN is used for chess. However, there are no indications that DCNN for chess is organized in the same way as for Go, since there is no mentioning about this in the paper. I guess they left it for the next Nature publication.
We know how are input features organized, we know the policies, but that really doesn't tell much about actual network implementation especially since both inputs and policies are totally different and much more complex in case of chess compared to Go.
The only thing we can guess from the paper is the total number/size of weights of NN.
We have 80k searches per second (so not only evaluations but complete MCT searches that perform one leaf eval per search on 4 BTUs (taking it from 8 leaf nodes deep queue), and there is also no mention of the hardware running actual MCTS but that is most probably general purpose CPU not weaker than the one that was running SF).
Since BTUs eval speed is the same as in training they were most probably the same, i.e. first generation ones performing 92 T int8 FMAs.
So, 4*92T/80k = 4.6GB of weights.
Did the paper actually say 80k iterations per second ? Wasn't it 80k nodes per second ?

80k searches per second from the root position, that is what paper says in a bit vague way ("AlphaZero searches just 80 thousand positions per second"). And ofc every clueless report reported 80k nodes per second (while MCTS traverses hundreds of nodes per single search while evaluating only one leaf node).

Uri Blass · Post by **Uri Blass** » Sun Dec 10, 2017 12:17 am

Milos wrote:
CheckersGuy wrote:
Milos wrote:
Rein Halbersma wrote:The deep neural network connects the pieces on different squares to each other. They use 3x3 convolutions. This means that the next 8x8 layer's cells are connected to a 3x3 region (called "receptive field") in the previous region, and to a 5x5 region in the layer before etc. After only 4 layers, each cell is connected to every other cell in the original input layer. For AlphaGoZero they used no less than 80 layers. Then they also have many "feature maps" in parallel, so that they can learn different concepts related to piece-square combinations. Finally, they use the last 8 positions as input as well, so they also have a sense of ongoing maneuvers. All this is then being trained on the game result and the best move from the MC tree search.

Although the amount of resources required to train the millions of weights related to these neural networks is enormous, conceptually it is not surprising that pawn structure, king safety, mobility and even deep tactics can be detected from the last 8 positions.
This is one of the best summaries from the AGZ paper assuming the same DCNN is used for chess. However, there are no indications that DCNN for chess is organized in the same way as for Go, since there is no mentioning about this in the paper. I guess they left it for the next Nature publication.
We know how are input features organized, we know the policies, but that really doesn't tell much about actual network implementation especially since both inputs and policies are totally different and much more complex in case of chess compared to Go.
The only thing we can guess from the paper is the total number/size of weights of NN.
We have 80k searches per second (so not only evaluations but complete MCT searches that perform one leaf eval per search on 4 BTUs (taking it from 8 leaf nodes deep queue), and there is also no mention of the hardware running actual MCTS but that is most probably general purpose CPU not weaker than the one that was running SF).
Since BTUs eval speed is the same as in training they were most probably the same, i.e. first generation ones performing 92 T int8 FMAs.
So, 4*92T/80k = 4.6GB of weights.
Did the paper actually say 80k iterations per second ? Wasn't it 80k nodes per second ?
80k searches per second from the root position, that is what paper says in a bit vague way ("AlphaZero searches just 80 thousand positions per second"). And ofc every clueless report reported 80k nodes per second (while MCTS traverses hundreds of nodes per single search while evaluating only one leaf node).

I do not know what alphazero is doing but if it is 80K searches when every search include hundrends of nodes then claiming alpha zero searches 80K thousand positions per second is simply a lie.

I see no reason for them to lie.
They do not sell alphazero and if they sell it in the future then I see no reason for me to care about number of positions per second but only about the moves that alphazero is going to play.

Milos · Post by **Milos** » Sun Dec 10, 2017 12:54 am

Uri Blass wrote:I do not know what alphazero is doing but if it is 80K searches when every search include hundrends of nodes then claiming alpha zero searches 80K thousand positions per second is simply a lie.

I see no reason for them to lie.
They do not sell alphazero and if they sell it in the future then I see no reason for me to care about number of positions per second but only about the moves that alphazero is going to play.

This is definitively what alpha0 is doing. Read detailed description of MCTS from AGZ paper and it would be clear to you too.
They don't do rollouts, meaning they do selection, expansion, evaluation and backpropagation. Expansion means traversing many nodes from root until you reach the leaf. Backpropagation means you go back from the leaf node to the root.
The numbers also match very well.
During self-play Alpha0 does 800 MCTS searches for 40ns on a single BTU (Table S3 in the paper).
On a same BTU during match against SF Alpha0 was "searching 80'000 positions per second" using 4BTUs (sentence from the page 5 of the paper) .
Just do the math.
They are saying it like that because there is only a single evaluation per search, so only 1 position is evaluated, but in the same time search is traversing many nodes. That is how MCTS without rollouts work.
They don't say it in a correct way because it is not a scientific paper they published but essentially an advertising leaflet.

Problem is, despite AGZ paper being published, number of ppl from chess community who read it and actually understood is minuscule and MCTS is not at all described in the latest paper. That's the reason for so much disinformation and ridiculous stuff you can read around here and on computer chess blogs.

CheckersGuy · Post by **CheckersGuy** » Sun Dec 10, 2017 3:30 am

Milos wrote:
Uri Blass wrote:I do not know what alphazero is doing but if it is 80K searches when every search include hundrends of nodes then claiming alpha zero searches 80K thousand positions per second is simply a lie.

I see no reason for them to lie.
They do not sell alphazero and if they sell it in the future then I see no reason for me to care about number of positions per second but only about the moves that alphazero is going to play.
This is definitively what alpha0 is doing. Read detailed description of MCTS from AGZ paper and it would be clear to you too.
They don't do rollouts, meaning they do selection, expansion, evaluation and backpropagation. Expansion means traversing many nodes from root until you reach the leaf. Backpropagation means you go back from the leaf node to the root.
The numbers also match very well.
During self-play Alpha0 does 800 MCTS searches for 40ns on a single BTU (Table S3 in the paper).
On a same BTU during match against SF Alpha0 was "searching 80'000 positions per second" using 4BTUs (sentence from the page 5 of the paper) .
Just do the math.
They are saying it like that because there is only a single evaluation per search, so only 1 position is evaluated, but in the same time search is traversing many nodes. That is how MCTS without rollouts work.
They don't say it in a correct way because it is not a scientific paper they published but essentially an advertising leaflet.

Problem is, despite AGZ paper being published, number of ppl from chess community who read it and actually understood is minuscule and MCTS is not at all described in the latest paper. That's the reason for so much disinformation and ridiculous stuff you can read around here and on computer chess blogs.

Exactly. Additionally, many ppl are "sad" that deepMind didn't describe everything in detail. They simply forgot that this is basically is just a preprint and the full paper is yet to be published

shrapnel · Post by **shrapnel** » Sun Dec 10, 2017 4:58 am

mclane wrote:Stockfish and Komodo and Houdini play machine chess.
AZ plays a human like chess.

It sacs pieces for attack or development or space.

And the sacs do NOT lead to a material compensation soon, otherwise
Stockfish and Komodo and Houdini would see the sac and not eat the piece.

It seems AZ plays chess out of the interval stockfish/Komodo/ Houdini searches.

This is very funny to replay.
Stockfish gets smashed down like an idiot.

I guess any human chess player no matter which level he plays can observe that stockfish has no chance at all to win.

The way AZ plays is heavily different then that of the normal chess programs that I feel sorrow for them.
They all play machine chess in their interval of 20-30 plies they search.
But outside this interval, AZ kills them with very easy moves.

A human mind can understand those moves, but for a chess program with a search tree it seems those moves are very difficult to understand.

We see human chess beat machine chess.

With AZ playing like a machine emulated human.

Brilliant Post !
Though the English isn't very good, it simply lays out the facts and exposes the truth/brutal reality of the consequences of this Historic Match !

shrapnel · Post by **shrapnel** » Sun Dec 10, 2017 5:04 am

corres wrote: "Thinking" of AlphaZero is based on vectors and probability.
This is very different from thought of human.

You are again nit-picking, Szabo.
Even if Thinking of AlphaZero is based on Vectors and Probability, the FACT remains that some of the moves it makes are very human-like, BUT without the weakness of humans who are always out-calculated by Chess Engines.
So, in summary, AlphaZero is playing like a human world chess champion, but without the weaknesses of humans.
It is almost as if AlphaZero is self-aware !! Awesome.
Now I truly understand the meaning of the Term Artificial Intelligence.

Michael Sherwin · Post by **Michael Sherwin** » Sun Dec 10, 2017 6:04 am

shrapnel wrote:
mclane wrote:Stockfish and Komodo and Houdini play machine chess.
AZ plays a human like chess.

It sacs pieces for attack or development or space.

And the sacs do NOT lead to a material compensation soon, otherwise
Stockfish and Komodo and Houdini would see the sac and not eat the piece.

It seems AZ plays chess out of the interval stockfish/Komodo/ Houdini searches.

This is very funny to replay.
Stockfish gets smashed down like an idiot.

I guess any human chess player no matter which level he plays can observe that stockfish has no chance at all to win.

The way AZ plays is heavily different then that of the normal chess programs that I feel sorrow for them.
They all play machine chess in their interval of 20-30 plies they search.
But outside this interval, AZ kills them with very easy moves.

A human mind can understand those moves, but for a chess program with a search tree it seems those moves are very difficult to understand.

We see human chess beat machine chess.

With AZ playing like a machine emulated human.
Brilliant Post !
Though the English isn't very good, it simply lays out the facts and exposes the truth/brutal reality of the consequences of this Historic Match !

So if a standard chess algorithm decided to sac a rook before move 20 and then over 100 ply later won the game while being a rook behind most of that time what would you say then? You would probably say beforehand that it would never happen.

[pgn][Event "RomiChess Gauntlet"]
[Site "ChessGUI2"]
[Date "2017.09.07"]
[Round "2.37"]
[White "RomiChess P3M 64-bit"]
[Black "Ifrit m1.8 64-bit"]
[Result "1-0"]
[ECO "A09"]
[Opening "Reti Opening"]
[Time "2:05:57 p.m."]
[Variation "Advance Variation, 3.e3 c5"]
[TimeControl "40/1920:40/1920:40/1920"]
[Termination "normal"]
[PlyCount "140"]
[WhiteType "human"]
[BlackType "human"]

{Q8200 Quad} 1. Nf3 d5 2. c4 d4 3. c5 Nc6 4. Qa4 Bd7 5. Nxd4 e5 6. Nxc6
Bxc6 7. Qc4 Qd4 8. Qxd4 exd4 9. b4 a5 10. Bb2 axb4 11. Bxd4 Nf6 12. Bxf6
gxf6 13. d4 b6 14. cxb6 b3 15. Nd2 Bb4 16. a4 Be4 17. e3 b2 18. Rd1 O-O 19.
Bb5 b1=R 20. Rxb1 Bxb1 21. b7 Ra7 22. Ke2 Bg6 23. Bc6 Kg7 24. Nc4 Bc2 25.
Ra1 Rb8 26. a5 Bb3 27. Nd2 Be6 28. d5 Bf5 29. a6 Bg6 30. Nc4 Be7 31. Kf3
Bd3 32. Bb5 f5 33. Nd6 Bxd6 34. Bxd3 Kf6 35. Ra4 Kg5 36. Ke2 Rd8 37. h3 h5
38. g3 Rb8 39. Kd2 Kf6 40. Rh4 Kg6 41. g4 hxg4 42. hxg4 Kf6 43. gxf5 Ke5
44. e4 Kd4 45. f6 Kc5 46. Rh1 Kb6 47. Rb1+ Ka5 48. Rb2 Bf4+ 49. Ke2 Ka4 50.
Rc2 Kb4 51. Kf3 Be5 52. Rc6 Ka5 53. Ke3 Bh2 54. Rc1 Rxa6 55. Ra1+ Kb6 56.
Bxa6 Be5 57. Ra2 Bxf6 58. Kd2 Be7 59. Kc3 Bd6 60. Kb3 c5 61. dxc6 Kxc6 62.
Kc4 Kb6 63. Kd4 Bc5+ 64. Kd5 Rd8+ 65. Kc4 Bxf2 66. Rxf2 Kxa6 67. Rxf7 Kb6
68. e5 Ka7 69. e6 Rd1 70. e7 Rc1+ 1-0[/pgn]

When Romi sacrificed the rook she thought that she was ahead in the score. Ifrit thought it was winning. Clearly Romi sacrificed on purely positional motivation! So don't blame the algorithm for lack of imagination. Blame the programmers. And this game was played with learning off. You ought to see what Romi can do with her learning turned on after she has trained.

Albert Silver · Post by **Albert Silver** » Sun Dec 10, 2017 6:28 am

shrapnel wrote:
mclane wrote:Stockfish and Komodo and Houdini play machine chess.
AZ plays a human like chess.

It sacs pieces for attack or development or space.

And the sacs do NOT lead to a material compensation soon, otherwise
Stockfish and Komodo and Houdini would see the sac and not eat the piece.

It seems AZ plays chess out of the interval stockfish/Komodo/ Houdini searches.

This is very funny to replay.
Stockfish gets smashed down like an idiot.

I guess any human chess player no matter which level he plays can observe that stockfish has no chance at all to win.

The way AZ plays is heavily different then that of the normal chess programs that I feel sorrow for them.
They all play machine chess in their interval of 20-30 plies they search.
But outside this interval, AZ kills them with very easy moves.

A human mind can understand those moves, but for a chess program with a search tree it seems those moves are very difficult to understand.

We see human chess beat machine chess.

With AZ playing like a machine emulated human.
Brilliant Post !
Though the English isn't very good, it simply lays out the facts and exposes the truth/brutal reality of the consequences of this Historic Match !

I happen to completely agree, and it is what I wrote in my article on it. I said that if Karpov were an engine his name would be AlphaZero. It was uncanny how differently it seemed to play.

AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs