Page 5 of 9

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 10:35 am
by Michael Sherwin
zenpawn wrote:
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.
My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.
A quote from one of Milos post.

"When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

This is evidence of pre match training against SF. How many human opening positions were trained against? Here is more of the quote.

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database of human chess games"

So we not only have pre match training against SF but they used the most common human played positions to conduct that training.

So my original observation based on my experience with reinforcement learning that they must've used a human database and pre training against SF appears to be quite accurate.
I took those to be games played after the self-play training or at least not used to learn. The thing is called Zero for the very reason that it doesn't start with a database of games.
Then the first quote is a poorly constructed sentence as it clearly says that A0 defeated SF from EACH human opening. The second quote defines what is meant by each human opening. It is every position that occurred at least 100,000 times in a human online database. So my question is, were all those human opening positions covered in the 100 game match? Not even close! So unless the first quote is just poor sentence construction then A0 played training matches from all opening positions that were in an online database 100,000 times or more.

But if you understand that sentence to mean something different then just go with that! But for me it does not change what the sentence actually says. And that is not my fault. They should clarify the issue.

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 11:07 am
by zenpawn
Michael Sherwin wrote: But if you understand that sentence to mean something different then just go with that! But for me it does not change what the sentence actually says. And that is not my fault. They should clarify the issue.
Agreed, perhaps some room for interpretation. We'll have to wait and see how the final paper looks after review.

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 12:15 pm
by Tobber
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.
My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.
A quote from one of Milos post.

"When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

This is evidence of pre match training against SF. How many human opening positions were trained against? Here is more of the quote.

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database of human chess games"

So we not only have pre match training against SF but they used the most common human played positions to conduct that training.

So my original observation based on my experience with reinforcement learning that they must've used a human database and pre training against SF appears to be quite accurate.
I took those to be games played after the self-play training or at least not used to learn. The thing is called Zero for the very reason that it doesn't start with a database of games.
Then the first quote is a poorly constructed sentence as it clearly says that A0 defeated SF from EACH human opening. The second quote defines what is meant by each human opening. It is every position that occurred at least 100,000 times in a human online database. So my question is, were all those human opening positions covered in the 100 game match? Not even close! So unless the first quote is just poor sentence construction then A0 played training matches from all opening positions that were in an online database 100,000 times or more.

But if you understand that sentence to mean something different then just go with that! But for me it does not change what the sentence actually says. And that is not my fault. They should clarify the issue.
How is it possible you can't read the published paper? It clearly says that A0 defeated Stockfish in 100 games on certain conditions, i.e 1 minute per move and so on. They also matched A0 against Stockfish with 100 games on each of the so called human openings. Time per move for this games are unknown but likely much shorter. It's obvious from table 2 that it's 100 games per opening. Why this should be considered pre-training against Stockfish is certainly not obvious, especially if it took place after the 100 games main match. In the paper it's mentioned after the main match but I guess it's more interesting with some conspiracy theory.

/John

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 12:39 pm
by corres
[quote="Michael Sherwin"]

I did not read any white papers on A0. I only read some reports by journalist. All I was trying to do was demystify somewhat the phenomenon that is A0.

[/quote]

Nothing else read white papers so you do it well!
Thanks for it.

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 12:46 pm
by Tobber
corres wrote:
Michael Sherwin wrote:
I did not read any white papers on A0. I only read some reports by journalist. All I was trying to do was demystify somewhat the phenomenon that is A0.
Nothing else read white papers so you do it well!
Thanks for it.
Sorry, didn't mean anything from you, my opinion is that Mr Sherwin is talking BS.

/John

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 12:49 pm
by corres
[quote="Tobber"]

....I guess it's more interesting with some conspiracy theory.
/John

[/quote]


Conspiracy PRACTICE always was, is and will be.
Particularly if a huge amount of money depend on it.

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 1:02 pm
by corres
If you have exact knowledge about AlphaZero - more than a journalist have - please divide them with us.
Mr. Sherwin has a view based on his learning and his practice and I thank him for dividing us.

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 1:14 pm
by Tobber
corres wrote:If you have exact knowledge about AlphaZero - more than a journalist have - please divide them with us.
Mr. Sherwin has a view based on his learning and his practice and I thank him for dividing us.
I can read the published paper, why don't you do the same?

/John

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 1:59 pm
by corres
It is pity, but "white papers" does not give to public, even if it is a scientific public. Know-how, patent, trade secret, details of system working, etc. are not the subject of any public papers.

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Posted: Sat Dec 09, 2017 4:38 pm
by zenpawn
From the paper, "Starting from random play, and given no domain knowledge except the game rules,...". And: "The AlphaZero algorithm is a more generic version of the AlphaGo Zero algorithm that was first introduced in the context of Go. It replaces the handcrafted knowledge and domain-specific augmentations used in traditional game-playing programs with deep neural networks and a tabula rasa reinforcement learning algorithm."