recent article on alphazero ... 12/11/2017 ...

peter · Post by **peter** » Sun Dec 17, 2017 12:42 am

syzygy wrote:
peter wrote:
hgm wrote:No adjustment of the N was done during the match.
...
It is all clearly described in the paper.
Can you or Ronald please show the lines of the paper in which is said, the NN wasn't adjusted anymore through the games?

I have found only this till now:

We evaluated the fully trained instances of AlphaZero against Stockfish, Elmo and the previous version of AlphaGo Zero (trained for 3 days) in chess, shogi and Go respectively, ...
"fully trained". Adjusting the NN is called "training".

There are many other clear statements:
Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

Recently, the AlhpaGo Zero algorithm achieved superhuman performance in the game of Go, by representing Go knowledge using deep convolutional neural networks (22, 28), trained solely by reinforcement learning from games of self-play (29). In this paper, we apply a similar but fully generic algorithm, which we call AlphaZero, ...

AlphaZero learns these move probabilities and value estimates entirely from self-play; these are then used to guide its search.

The parameters θ of the deep neural network in AlphaZero are trained by self-play reinforcement learning, starting from randomly initialised parameters θ.

We trained a separate instance of AlphaZero for each game. Training proceeded for 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters, using 5,000 first-generation TPUs (15) to generate self-play games and 64 second-generation TPUs to train the neural networks.
The games against SF were played by the (fully trained) AlphaZero on a machine with 4 TPUs. A completely different setup than that used for adjusting the neural network parameters.
During training, each MCTS used 800 simulations.
When playing SF8 for 100 games, the MCTS performed 80,000 simulations per second during 1 minute for each move.

Sorry, Ronald, none of your quotes to me says clearly there wasn't any adjustment of the NN during the match anymore.

Maybe I'm to biased in the meantime by myself, but I'd still like to see the 100 games and have a clear statement about "learning" of A0 by playing against SF or not.

If you're right, which I could well imagine of course too, I still would like to see a rematch under less biased conditions yet.

syzygy · Post by **syzygy** » Sun Dec 17, 2017 1:04 am

peter wrote:Sorry, Ronald, none of your quotes to me says clearly there wasn't any adjustment of the NN during the match anymore.

Of course not...

And even if they had said "special message to Peter: we did not adjust any weights during the 100 games against Stockfish", you would have said there is no proof that they did not.

Maybe I'm to biased in the meantime by myself, but I'd still like to see the 100 games and have a clear statement about "learning" of A0 by playing against SF or not.

Yes, you are so biased that nothing can save you.

I will remember it is a waste of time to discuss this any further with you.

Dariusz Orzechowski · Post by **Dariusz Orzechowski** » Sun Dec 17, 2017 2:07 am

hgm wrote:MCTS does not automatically learn. To learn the NN should be altered, and this was done in the training phase by the 24 generation-2 TPUs, after they received the training games played with the previous setting from the 5000 generation-1 TPUs.

No adjustment of the N was done during the match. Just playing on 4 gen-2 TPUs.

Small nitpick: looks like the match was played on gen-1 TPUs.

Other than that, I agree, there is no reason to assume learning during the match, it's too far-fetched and doesn't make sense. Openings were repeating because AZ almost doesn't have randomness. It's clearly visible in Go where in 20 published games between AlphaGo Zero and AlphaGo Master they played only 2 openings (about 16-18 moves) with exactly the same moves over and over again.

Eelco de Groot · Post by **Eelco de Groot** » Sun Dec 17, 2017 2:12 am

peter wrote:
Sorry, Ronald, none of your quotes to me says clearly there wasn't any adjustment of the NN during the match anymore.

I have not read the article but from some comments from HGM elsewhere to Ed, I understand that the final NN is probably very different from the training NN. For instance it does not have to do floating point operations. This is just my impression: Those TPU are a bit like Crays on a chip, doing massive vector based FLOPS. But for the final NN, that is, in principle, as far as I can see much simpler. In biology, neurons work a bit like filter circuits -I am getting a bit out of my depth here already- but I think you could simulate that final NN even with HGMs experiments here; The Gigatron Project You just have to have the schematics that are evolved out of the training. In principle, this could be very fast (it does not have to do any branching instructions for instance, not as far as I can see) Doing the final stuff on the TPU seems massive overkill in terms of hardware to me. But they were on a tight schedule and building another hardware just for chess, would have cost time and money. The TPUs as far as I know will and can be used for any learning task and are probably reprogrammed many times since. That is why I don't think we will see any rematches under 'fairer' conditions... Alpha Zero has moved on...

I know preciously little about NN, during and after my biology study this was just in its infancy This was the bible back then back in the eighties, first edition 1986

. But it is not very up to date with AlphaZero I'm afraid. I do have it but I did not do much with it at the time, although it is very interesting. The PDP Research Group did introduce backpropagation in Neural Networks:

Milos · Post by **Milos** » Sun Dec 17, 2017 2:35 am

hgm wrote:As was already remarked, AlphaZero probably suffered more from the fixed TC as Stockfish, because training for superior time management is orders of magnitude simpler than training for good Chess, and would have certainly been in the capabilities of the NN. Likewise, if Books had been used, AlphaZero would probably have had a much better book than Stockfish.

This is just a bunch of totally unfounded ramblings.
Please Mr. Harm, explain to us how is this "training for superior time management that is orders of magnitude simpler than training for good Chess" actually done? Lets hear a bit more of your genius thoughts.

Milos · Post by **Milos** » Sun Dec 17, 2017 2:42 am

hgm wrote:How you came to think that this could mean that tournaments like TCEC could be played entirely without book is, well, let's call that 'strange'. It is common knowledge that hardly any engine nowadays intentionally randomizes its moves, so that you typically get very similar and certainly not independent games when you let them play their own (searched) moves from the same starting position. One would expect you to know that. But perhaps your innate meanness got the better of you, and compelled you to nevertheless make such a silly assumption to provide you a opportunity to display it.

Of course when dealing with engines that randomize, opening books are not needed for creating game diversity.

You seems to be totally unaware of the fact that MCTS that A0 is using has no randomness at all beside SMP randomness of 8 parallel threads waiting in the TPU eval queue. So effectively A0 introduces much less randomness in games than SF on 64 threads as is quite nicely demonstrated with those 10 games having many identical moves in the opening.
Then you claim it is "strange" for conventional engines to play exclusively from a chess starting position, but in the same time you find it totally ok that those 100 games that Google used for advertising A0, were all played from the chess starting position.
Maybe you should look in the dictionary under the term "double standard" or "hypocrisy" coz you don't seem to be aware of them.

peter · Post by **peter** » Sun Dec 17, 2017 7:21 am

syzygy wrote:Yes, you are so biased that nothing can save you.

I will remember it is a waste of time to discuss this any further with you.

Sorry for that, Ronald, so ok, I give in about that.

You'll be right, no learning while playing, I'll be wrong.

So just to repeat it once again, what was my point on the first string and still is:

SF's chess in that match, as fas as I saw, was simply utterly weak.

A0 had 5 beautiful moves in 10 games shown, but those aren't any miracles nor conjury tricks neither for engines nor for humans.
I want to see more of such moves and yet see the level I'm used to see from the engine as an opponent.

Maybe SF had to play so badly being outperformed fully all of the time, of course playing in disadvantage is always much more difficult than in advantage.
But to understand at least a little what had happened on the board, I'd have to see more of the 100 games and I'd have to see how well A0 would play against reasonable opening theory too.

The main point: I don't think we'll see any more of that DeepMind- chess, if we don't ask for it and give good reasons for our demand.
I guess Eelco is simply right too, there won't come anything more from Google's side on their own, they had the best advertising- effect they could expect, or they hadn't, and I still think, it's a matter of the audience we here and in science give to them.

Come on folks, don't be so humble, we are chess players, aren't we? So before giving in so easily rather at least let's have some more rumble.

Ceterum censeo Googlem pro vindictam esse postulandam.

recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...

Re: recent article on alphazero ... 12/11/2017 ...