AlphaGo Zero And AlphaZero, RomiChess done better

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 12:25 am

Michael Sherwin wrote:
Werner wrote:Hi Mike,
so if I include all CEGT games into the Folder, merge them and repeat a match played now I will get much better result?

- original games are played without learning
- now using learning on and repeat a match

I will try : but I do not think the engines uses this lear file.
Settings here
learn_on
book_off
quit

I do not know how to use the command in the help.txt inside Romi:
douselearn ?

best wishes

Werner
Hi Werner,

To have the learning fully enabled type learn_on then book_on then quit. Learning is part of the book structure. If learning was on all this time then the learn.dat file should be quite large by now. Just loading pgn files will give Romi a better result but for a best result Romi needs many games to personally have experience with the lines and start to select lines that are better for Romi.

I forgot to mention that using a 'CEGT' book will override Romi's book until the CEGT book is done then Romi will start using her book and learning after that.

corres · Post by **corres** » Sat Dec 09, 2017 12:29 am

[quote="Michael Sherwin"]

I am most skeptical about interview style reporting. Reporters will often take a few thin facts and weave a whole story around it that has lots of inaccuracies and often has just outright made up garbage.
I am less skeptical about papers written directly by the authors. Still I'm a bit skeptical because they do not always tell all.
The reason I started this thread is because I looked at the games and the moves by Alpha0 had the same learned feel to them that Romi's moves have when Romi would win against a superior oppenent.

[/quote]

Always this is the situation when the press and the money meet each other.
Based on the similarity between moves of Romi and moves of AlphaZero
one should think that AlphaZero produced a kind of "learning file" what was used by a normal chess engine - maybe a derivative of Stockfish...
Or this is a very absurd supposition?

Daniel Shawul · Post by **Daniel Shawul** » Sat Dec 09, 2017 12:47 am

I don't know how this thread evolved into a series one, because I thought it was meant as a tounge-and-cheek comparison.
Are you really equating Romi's learning (so far as I know is just standard book learning) with what AlphaZero is doing ?

To be precise, AlphaZero uses a unique search (MCTS) which is very selective, it orders moves with a deep NN, and evaluates positions also with deep NN. This deep NN is a generic network that can be used in any kind of postion, not like some book learning stuff that avoids particular moves. So I don't see how this unique approach is same as Romi's

If you are talking about learning in general, yes it has been used in chess before AlphaZero, with the most prominent and general one being TD-lambda.

Daniel

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 1:04 am

corres wrote:
Michael Sherwin wrote:
I am most skeptical about interview style reporting. Reporters will often take a few thin facts and weave a whole story around it that has lots of inaccuracies and often has just outright made up garbage.
I am less skeptical about papers written directly by the authors. Still I'm a bit skeptical because they do not always tell all.
The reason I started this thread is because I looked at the games and the moves by Alpha0 had the same learned feel to them that Romi's moves have when Romi would win against a superior oppenent.

Always this is the situation when the press and the money meet each other.
Based on the similarity between moves of Romi and moves of AlphaZero
one should think that AlphaZero produced a kind of "learning file" what was used by a normal chess engine - maybe a derivative of Stockfish...
Or this is a very absurd supposition?

I am not addressing the Alpha0 playing algorithm. I understand that it is massively parallel MCTS. That alone makes it far different than Stockfish. I'm not that skeptical about the reporting to believe that someone is lying about the underlying algorithm.

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 1:23 am

Daniel Shawul wrote:I don't know how this thread evolved into a series one, because I thought it was meant as a tounge-and-cheek comparison.
Are you really equating Romi's learning (so far as I know is just standard book learning) with what AlphaZero is doing ?

To be precise, AlphaZero uses a unique search (MCTS) which is very selective, it orders moves with a deep NN, and evaluates positions also with deep NN. This deep NN is a generic network that can be used in any kind of postion, not like some book learning stuff that avoids particular moves. So I don't see how this unique approach is same as Romi's

If you are talking about learning in general, yes it has been used in chess before AlphaZero, with the most prominent and general one being TD-lambda.

Daniel

This is an old argument about Romi's learning being "standard book learning". It is not. The reinforcement learning in RomiChess is stored in a tree structure that doubles as an opening book. That much is true. However, the subtree is loaded into the hash before each search along with its learned rewards and penalties earned from previous results. This learned value affects which move the search decides is the best. That has nothing to do with an opening book!

I don't know if tongue in cheek is the correct terminology for what I intended but I was not being serious that Alpha0 was very similar to RomiChess at all. I was just making the point that after looking at A0's moves, given my experience, the learned moves had the same look and feel to them that Romi's learning produced. That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist. All I was trying to do was demystify somewhat the phenomenon that is A0.

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 1:54 am

Michael Sherwin wrote:
Daniel Shawul wrote:I don't know how this thread evolved into a series one, because I thought it was meant as a tounge-and-cheek comparison.
Are you really equating Romi's learning (so far as I know is just standard book learning) with what AlphaZero is doing ?

To be precise, AlphaZero uses a unique search (MCTS) which is very selective, it orders moves with a deep NN, and evaluates positions also with deep NN. This deep NN is a generic network that can be used in any kind of postion, not like some book learning stuff that avoids particular moves. So I don't see how this unique approach is same as Romi's

If you are talking about learning in general, yes it has been used in chess before AlphaZero, with the most prominent and general one being TD-lambda.

Daniel
This is an old argument about Romi's learning being "standard book learning". It is not. The reinforcement learning in RomiChess is stored in a tree structure that doubles as an opening book. That much is true. However, the subtree is loaded into the hash before each search along with its learned rewards and penalties earned from previous results. This learned value affects which move the search decides is the best. That has nothing to do with an opening book!

I don't know if tongue in cheek is the correct terminology for what I intended but I was not being serious that Alpha0 was very similar to RomiChess at all. I was just making the point that after looking at A0's moves, given my experience, the learned moves had the same look and feel to them that Romi's learning produced. That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist. All I was trying to do was demystify somewhat the phenomenon that is A0.

And I would like to add that if SF had the same type of reinforcement learning that Romi has then a trained SF could be 400 (maybe a 1000) elo or more higher than in its non trained state. That is after a million games had been played by SF. It would take a cooperative effort and a way to merge learn files if it were to be done in a reasonable amount of time. SF would have to include WB protocol or UCI would have to add a result command if it has not done that already in the last decade. And of course I suspect that the SF team could improve on my base algorithm without difficulty.

However, it does not have to be SF. It could be anyone. So I am really baffled that nobody has done that in the last 11+ years!

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 4:43 am

Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.

My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 5:56 am

zenpawn wrote:
Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.
My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.

A quote from one of Milos post.

"When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

This is evidence of pre match training against SF. How many human opening positions were trained against? Here is more of the quote.

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database of human chess games"

So we not only have pre match training against SF but they used the most common human played positions to conduct that training.

So my original observation based on my experience with reinforcement learning that they must've used a human database and pre training against SF appears to be quite accurate.

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 9:39 am

Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.
My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.
A quote from one of Milos post.

"When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

This is evidence of pre match training against SF. How many human opening positions were trained against? Here is more of the quote.

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database of human chess games"

So we not only have pre match training against SF but they used the most common human played positions to conduct that training.

So my original observation based on my experience with reinforcement learning that they must've used a human database and pre training against SF appears to be quite accurate.

I took those to be games played after the self-play training or at least not used to learn. The thing is called Zero for the very reason that it doesn't start with a database of games.

corres · Post by **corres** » Sat Dec 09, 2017 10:25 am

[quote="Michael Sherwin"]

I am not addressing the Alpha0 playing algorithm. I understand that it is massively parallel MCTS. That alone makes it far different than Stockfish. I'm not that skeptical about the reporting to believe that someone is lying about the underlying algorithm.

[/quote]

Sorry, but no one speak about lying.
As you also stated the AlphaZero team give to public very small information about the details.
From the texts we know that the massively parallel MCTS was used during the learning process. But playing against Stockfish it is very doubtful to use MCTS, I think.

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better