AlphaZero is not like other chess programs

Rein Halbersma · Post by **Rein Halbersma** » Sun Dec 10, 2017 10:41 am

Milos wrote:
Rein Halbersma wrote:The deep neural network connects the pieces on different squares to each other. They use 3x3 convolutions. This means that the next 8x8 layer's cells are connected to a 3x3 region (called "receptive field") in the previous region, and to a 5x5 region in the layer before etc. After only 4 layers, each cell is connected to every other cell in the original input layer. For AlphaGoZero they used no less than 80 layers. Then they also have many "feature maps" in parallel, so that they can learn different concepts related to piece-square combinations. Finally, they use the last 8 positions as input as well, so they also have a sense of ongoing maneuvers. All this is then being trained on the game result and the best move from the MC tree search.

Although the amount of resources required to train the millions of weights related to these neural networks is enormous, conceptually it is not surprising that pawn structure, king safety, mobility and even deep tactics can be detected from the last 8 positions.
This is one of the best summaries from the AGZ paper assuming the same DCNN is used for chess. However, there are no indications that DCNN for chess is organized in the same way as for Go, since there is no mentioning about this in the paper. I guess they left it for the next Nature publication.
We know how are input features organized, we know the policies, but that really doesn't tell much about actual network implementation especially since both inputs and policies are totally different and much more complex in case of chess compared to Go.
The only thing we can guess from the paper is the total number/size of weights of NN.

My post was meant to explain how a deep NN could in principle see deep tactics. I didn't make any claims regarding the A0 network, only citing the AG0 network.

However, awaiting the full paper with all the details, my guess would be that A0-chess = AG0 - rotations - test games for selecting best network + game-dependent NN with more input planes, smaller spatial dimensions and larger policy vector output. Since they didn't mention any other big changes for A0-Go compared to AG0, I think it's safe to assume they still use 3x3 convolutions and ResNet blocks as their main infrastructure.

Anyway, hopefully more details soon!

corres · Post by **corres** » Sun Dec 10, 2017 10:51 am

I think as Stockfish and other used chess engine can not make any plan
so AlphaZero also can not make it.
This is only an illusion. The effect of neural network causes this illusion.
The neural network works as a kind of huge memory. It is like a huge and very deep "opening book" or "learning file" what AlphaZero makes during the process of teaching and when it plays games.
If Stockfish would make and use learning file Stockfish would have more Elo and it would play like a plan maker program.

peter · Post by **peter** » Sun Dec 10, 2017 10:54 am

MikeGL wrote:AlphaZero plays excellent chess, but I agree with you, the paper has more than 4 errors (maybe there are more which escaped my proofreading and
were not typographical errors nor punctuation marks).

A0 plays exciting chess, but as for the paper it has one major error for me, which I would rather call a bias.

The testing method against a bookless SF at fixed TC of 1'/move favors a learning machine too much to get any counting Celo- comparison out of it, and the games are probably mainly doublets only in a wider sense of the word doublet as for chess- meaning.

That we can only suppose having seen only 10 games, but out of these the white wins all are identical as for the first 6 moves:

1.e4 e5 2.Sf3 Sc6 3.Lb5 Sf6 4.d3 Lc5 5.Lxc6 dxc6 6.0–0

, and the black wins as for the first 4 of them:

1.Sf3 Sf6 2.c4 b6 3.d4 e6 4.g3 Ba6 or ...Bb7

corres · Post by **corres** » Sun Dec 10, 2017 10:56 am

[quote="shrapnel"][quote="mclane"]

Stockfish can be seen as a materialistic program eating a sacced piece to get a higher material value.
AZ meanwhile seems to play not for making best moves that lead to material wins, but to create and run a plan.
For the plan it sacs pieces.
This plan making chess with speculative sacs is IMO very similar to human chess.

[/quote]

Yes, that's what one of the Commentators said.
.....
[/quote]

Anil,
Please read my answer to Mr. Thorsten
Robert

corres · Post by **corres** » Sun Dec 10, 2017 11:11 am

[quote="peter"]Hi!

[quote="Milos"]
They don't say it in a correct way because it is not a scientific paper they published but essentially an advertising leaflet.
[/quote]
....
So I'd simply call the whole stuff bad scientific practice, if it was meant as a scientific paper already yet.
No scientific journal of some impact factor would print that at all

[/quote]

AlphaGo and AlphaZero are only a demonstration for proving the power and effectiveness of the DeepMind system.
Moreover they are a cheap(?) advertisement for Google.

mclane · Post by **mclane** » Sun Dec 10, 2017 11:25 am

shrapnel wrote:At the same time, the sacrifices it made were not like a human player would make, or Stockfish would have seen through them.
The sacrifices Alphazero made were so deep, that even a powerful program like Stockfish could not see through them and was forced to accept them.
That in itself should give pause for thought to the most vociferous defenders of Stockfish.
https://www.youtube.com/watch?v=lb3_eRNoH_w

Stockfish, Komodo, houdini are very strong chess programs.
But they are normal chess programs. They add Malus and bonus to a score and have very clever pruning methods to decide which branch to follow. They generate lots of NPS and build a search tree, very deep. And they see tactics within this Horizont.
But they rarely sac unsound.

If they sac it is winning within the Horizont, so it is sound.

But AZ sacs pieces like a human Beeing, humans can not calculate 100 percent if it works or not, they sac for a chance to win,

Kind of gambling.

And IMO AZ is doing similar.

Stockfish has no method to find out about the sac.
It can only eat the piece and die.
Or hope the opponent would fade out the plan, and make a draw.
But it cannot defend against a playing style that creates plans and gives pieces for doing so.

This is type B or type C chess programs beat type A.

pilgrimdan · Post by **pilgrimdan** » Sun Dec 10, 2017 1:04 pm

tmokonen wrote:speaking of PDFs...

http://incompleteideas.net/book/bookdraft2017nov5.pdf

Reinforcement Learning: An Introduction, 2nd edition complete draft, almost the final version, except no index. Get it while it's free.

thanks...

here are some notes I've taken from the pdf...

The core idea of MCTS is to successively focus multiple
simulations starting at the current state by extending the
initial portions of trajectories that have received high
evaluations from earlier simulations...

When both the rollout policy and the model do not require a lot
of computation, many simulated trajectories can be generated in a
short period of time...

Monte Carlo value estimates are maintained only for the subset of
state–action pairs that are most likely to be reached in a few
steps, which form a tree rooted at the current state...

MCTS incrementally extends the tree by adding nodes representing
states that look promising based on the results of the simulated
trajectories...

Any simulated trajectory will pass through the tree and then exit
it at some leaf node...

Outside the tree and at the leaf nodes the rollout policy is used
for action selection...

the states inside the tree have value estimates for some of the
actions, that balances exploration and exploitation...

each iteration of a basic version of MCTS consists of the
following four steps...

Selection - at the root node, a tree policy traverses the tree to
select a leaf node...

Expansion - on some iterations, the tree is expanded from the
selected leaf node...

Simulation - from the selected node, simulation of a complete
episode is run...

this results in a Monte Carlo trial with actions selected ﬁrst by
the tree policy and beyond the tree by the rollout policy...

Backup - the return generated by the simulated episode is backed
up to update the action values attached to the edges of the tree...

after the environment transitions to a new state, MCTS is run
again...

MCTS was used in the AlphaGo program that combines the Monte Carlo evaluations of MCTS with action values learned by a deep NN via self-play reinforcement learning...

MCTS is a decision-time planning algorithm based on Monte Carlo control applied to simulations that start from the root state...

that is, it is a kind of rollout algorithm...

It therefore beneﬁts from online, incremental, sample-based value estimation and policy improvement...

it saves action-value estimates attached to the tree edges and updates them using reinforcement learning’s sample updates...

This has the effect of focusing the Monte Carlo trials on trajectories whose initial segments are common to high-return trajectories previously simulate...

by incrementally expanding the tree, MCTS effectively grows a lookup table to store a partial action-value function...

memory is allocated to the estimated values of state–action pairs visited in the initial segments of high-yielding sample trajectories...

MCTS avoids the problem of globally approximating an action-value function while it retrains the beneﬁt of using past experience to guide exploration...

The striking success of decision-time planning by MCTS has deeply inﬂuenced artiﬁcial intelligence...

MikeGL · Post by **MikeGL** » Sun Dec 10, 2017 1:11 pm

peter wrote:
MikeGL wrote:AlphaZero plays excellent chess, but I agree with you, the paper has more than 4 errors (maybe there are more which escaped my proofreading and
were not typographical errors nor punctuation marks).
...a bookless SF at fixed TC of 1'/move favors a learning machine

Actually if you think about it, you actually don't need a learning machine using MCTS-NN to beat SF8 at 1 min per move.
You can just bookup any weaker engines and expect SF8 to play the same move over
and over (assuming you include SMP luck into your bookup). Booking up is not a new idea, I remember one
programmer complaining about a cheating technique in WCCC (between year 2002-2004) wherein a weaker engine will
bookup (or play thousands of games against a specific stronger opponent) then create a learn file using a
huge hardware to bookup against the stronger engine and beating a stronger engine at WCCC.
This is effective engine-engine preparation even if the time control is 40/40 or using other FIDE tournament time controls.

I am not saying DeepMind did this dirty bookup technique, because by just judging from its Go
accomplishment, A0 looks legit and real. But the "1 min per move issue" is one of the toughest
argument on this historic match. Next match should be live online and with a tournament
control, not a fixed time per move to convince the whole planet.

shrapnel · Post by **shrapnel** » Sun Dec 10, 2017 1:18 pm

https://www.youtube.com/watch?v=UcAfg9v_dDM
Another very nice, well-explained video.

mclane · Post by **mclane** » Sun Dec 10, 2017 1:24 pm

corres wrote:I think as Stockfish and other used chess engine can not make any plan
so AlphaZero also can not make it.
This is only an illusion. The effect of neural network causes this illusion.
The neural network works as a kind of huge memory. It is like a huge and very deep "opening book" or "learning file" what AlphaZero makes during the process of teaching and when it plays games.
If Stockfish would make and use learning file Stockfish would have more Elo and it would play like a plan maker program.

So Re1 was played why ?!

AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs

Re: AlphaZero is not like other chess programs