AlphaGo Zero And AlphaZero, RomiChess done better

Tobber · Post by **Tobber** » Sat Dec 09, 2017 1:14 pm

corres wrote:If you have exact knowledge about AlphaZero - more than a journalist have - please divide them with us.
Mr. Sherwin has a view based on his learning and his practice and I thank him for dividing us.

I can read the published paper, why don't you do the same?

/John

corres · Post by **corres** » Sat Dec 09, 2017 1:59 pm

It is pity, but "white papers" does not give to public, even if it is a scientific public. Know-how, patent, trade secret, details of system working, etc. are not the subject of any public papers.

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 4:38 pm

From the paper, "Starting from random play, and given no domain knowledge except the game rules,...". And: "The AlphaZero algorithm is a more generic version of the AlphaGo Zero algorithm that was first introduced in the context of Go. It replaces the handcrafted knowledge and domain-specific augmentations used in traditional game-playing programs with deep neural networks and a tabula rasa reinforcement learning algorithm."

IanO · Post by **IanO** » Sat Dec 09, 2017 5:03 pm

corres wrote:
Michael Sherwin wrote:
I am not addressing the Alpha0 playing algorithm. I understand that it is massively parallel MCTS. That alone makes it far different than Stockfish. I'm not that skeptical about the reporting to believe that someone is lying about the underlying algorithm.

Sorry, but no one speak about lying.
As you also stated the AlphaZero team give to public very small information about the details.
From the texts we know that the massively parallel MCTS was used during the learning process. But playing against Stockfish it is very doubtful to use MCTS, I think.

The paper explicitly states that AlphaZero used MCTS throughout. From page 5:

We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evalu- ations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon (27). Figure 2 shows the scalability of each player with respect to thinking time, measured on an Elo scale, relative to Stockfish or Elmo with 40ms thinking time. AlphaZero’s MCTS scaled more effectively with thinking time than either Stockfish or Elmo, calling into question the widely held belief (4, 11) that alpha-beta search is inherently superior in these domains.

Honestly, I find this to be a more startling advance than the deep learning result. And by the way, this is just as startling for the computer shogi world, which also relies on alpha-beta for the main search (with different techniques than chess for the tips of the tree, since there is no quiescence in shogi).

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 6:44 pm

Tobber wrote:
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.
My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.
A quote from one of Milos post.

"When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

This is evidence of pre match training against SF. How many human opening positions were trained against? Here is more of the quote.

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database of human chess games"

So we not only have pre match training against SF but they used the most common human played positions to conduct that training.

So my original observation based on my experience with reinforcement learning that they must've used a human database and pre training against SF appears to be quite accurate.
I took those to be games played after the self-play training or at least not used to learn. The thing is called Zero for the very reason that it doesn't start with a database of games.
Then the first quote is a poorly constructed sentence as it clearly says that A0 defeated SF from EACH human opening. The second quote defines what is meant by each human opening. It is every position that occurred at least 100,000 times in a human online database. So my question is, were all those human opening positions covered in the 100 game match? Not even close! So unless the first quote is just poor sentence construction then A0 played training matches from all opening positions that were in an online database 100,000 times or more.

But if you understand that sentence to mean something different then just go with that! But for me it does not change what the sentence actually says. And that is not my fault. They should clarify the issue.
How is it possible you can't read the published paper? It clearly says that A0 defeated Stockfish in 100 games on certain conditions, i.e 1 minute per move and so on. They also matched A0 against Stockfish with 100 games on each of the so called human openings. Time per move for this games are unknown but likely much shorter. It's obvious from table 2 that it's 100 games per opening. Why this should be considered pre-training against Stockfish is certainly not obvious, especially if it took place after the 100 games main match. In the paper it's mentioned after the main match but I guess it's more interesting with some conspiracy theory.

/John

I was going to let this go but after reading below, "my opinion is that Mr Sherwin is talking BS", I just had to point out your hypocrisy. Without any knowledge of the 'paper' I said for good reason that it looks like they carried out training matches against SF. And low and behold there is good evidence of that in the 'paper'. I also said that they probably used a human games database. And they did. But to your fancy and nothing else of substance you prefer to believe the "100" game matches on each "human opening" was after the main match. What purpose would those matches serve if they were after the main match. I can give a purpose for those matches if they were before the main match. It would be for training. That is unless you fancy that they turned off the learning for those matches. So when applying a BS meter to what we both have said your BS reading is much higher. And that is your hypocrisy. Say what you want but I am done.

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 6:49 pm

corres wrote:
Michael Sherwin wrote:
I did not read any white papers on A0. I only read some reports by journalist. All I was trying to do was demystify somewhat the phenomenon that is A0.

Nothing else read white papers so you do it well!
Thanks for it.

Thanks Robert, I appreciate the understanding!

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 6:52 pm

They only used the database after the fact to determine which openings were most commonly played by humans and to test the training results therein:

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 7:36 pm

zenpawn wrote:They only used the database after the fact to determine which openings were most commonly played by humans and to test the training results therein:

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

I'm sorry Erin but that does not pass the smell test. This must have been before the main match or it lacks continuity of purpose. If it was before the main match do you believe they disabled learning for those pre matches? And why test against every position in the database that met the 100,000 number condition. Testing against a subset of those positions would tell them if they were successful or not. The most logical reason would be to be fully prepared for the main match with SF. What good would MC training do if A0 ended up in a line not stumbled upon by MC in self play from "zero"? The answer is they covered that problem with the most common moves in a human database. And they played those positions against SF in 100 game matches. So I repeat my question, do you think that they turned learning off for those pre matches? You can chose to be logical in this analysis or you can chose illogic. I simply said in my original post that it looks like they did certain things and the evidence uncovered backs that up. I'm not declaring it as an absolute. I am however sticking to my original declaration that it LOOKS LIKE they did certain things. And with that I'm done with this topic. Thanks to both sides for participating.

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 7:40 pm

Michael Sherwin wrote:You can chose to be logical in this analysis or you can chose illogic.

This jab was not necessary nor appreciated.

Michael Sherwin wrote:I simply said in my original post that it looks like they did certain things and the evidence uncovered backs that up.

I don't think it does. Sorry.

Tobber · Post by **Tobber** » Sat Dec 09, 2017 9:05 pm

zenpawn wrote:
Michael Sherwin wrote:You can chose to be logical in this analysis or you can chose illogic.
This jab was not necessary nor appreciated.

Michael Sherwin wrote:I simply said in my original post that it looks like they did certain things and the evidence uncovered backs that up.
I don't think it does. Sorry.

The published paper very clearly states that the method of learning was self-play and self-play only. I can see no logic at all in assuming they learned from the 1200 games with "human openings". I can't see that Google is really interested in chess at all, they are most likely interested in proof of concept. They have made a similar test with Go, where they started the learning process from human games but continued with self-play when AlphaGo became to strong compared to the human games.
They must sooner or later publish more details and first giving some information and later admitting they cheated is of course not in their interest. It is the method used that's interesting, this Go/Chess thing is just a step towards more profitable software.
Defeating an amateur made chess program with hardware no chess player can afford is not important enough to make it worth cheating, they could easily have done that by putting their resources on developing traditional chess software. It's the method that's important, and the proof that it's working.

/John

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better