AlphaGo Zero And AlphaZero, RomiChess done better

IanO · Post by **IanO** » Sat Dec 09, 2017 5:03 pm

corres wrote:
Michael Sherwin wrote:
I am not addressing the Alpha0 playing algorithm. I understand that it is massively parallel MCTS. That alone makes it far different than Stockfish. I'm not that skeptical about the reporting to believe that someone is lying about the underlying algorithm.

Sorry, but no one speak about lying.
As you also stated the AlphaZero team give to public very small information about the details.
From the texts we know that the massively parallel MCTS was used during the learning process. But playing against Stockfish it is very doubtful to use MCTS, I think.

The paper explicitly states that AlphaZero used MCTS throughout. From page 5:

We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evalu- ations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon (27). Figure 2 shows the scalability of each player with respect to thinking time, measured on an Elo scale, relative to Stockfish or Elmo with 40ms thinking time. AlphaZero’s MCTS scaled more effectively with thinking time than either Stockfish or Elmo, calling into question the widely held belief (4, 11) that alpha-beta search is inherently superior in these domains.

Honestly, I find this to be a more startling advance than the deep learning result. And by the way, this is just as startling for the computer shogi world, which also relies on alpha-beta for the main search (with different techniques than chess for the tips of the tree, since there is no quiescence in shogi).

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 6:44 pm

Tobber wrote:
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:
zenpawn wrote:
Michael Sherwin wrote:That lead me to the conclusion that A0 pretrained for the match against SF or at a minimum loaded and learned against SF games. Some post above seem to verify that observation. I did not read any white papers on A0. I only read some reports by journalist.
My understanding is its training was only via self-play starting from a blank slate, i.e., knowing only the rules.
A quote from one of Milos post.

"When starting from each human opening,
AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

This is evidence of pre match training against SF. How many human opening positions were trained against? Here is more of the quote.

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the
most common human openings (those played more than 100,000 times in an online database of human chess games"

So we not only have pre match training against SF but they used the most common human played positions to conduct that training.

So my original observation based on my experience with reinforcement learning that they must've used a human database and pre training against SF appears to be quite accurate.
I took those to be games played after the self-play training or at least not used to learn. The thing is called Zero for the very reason that it doesn't start with a database of games.
Then the first quote is a poorly constructed sentence as it clearly says that A0 defeated SF from EACH human opening. The second quote defines what is meant by each human opening. It is every position that occurred at least 100,000 times in a human online database. So my question is, were all those human opening positions covered in the 100 game match? Not even close! So unless the first quote is just poor sentence construction then A0 played training matches from all opening positions that were in an online database 100,000 times or more.

But if you understand that sentence to mean something different then just go with that! But for me it does not change what the sentence actually says. And that is not my fault. They should clarify the issue.
How is it possible you can't read the published paper? It clearly says that A0 defeated Stockfish in 100 games on certain conditions, i.e 1 minute per move and so on. They also matched A0 against Stockfish with 100 games on each of the so called human openings. Time per move for this games are unknown but likely much shorter. It's obvious from table 2 that it's 100 games per opening. Why this should be considered pre-training against Stockfish is certainly not obvious, especially if it took place after the 100 games main match. In the paper it's mentioned after the main match but I guess it's more interesting with some conspiracy theory.

/John

I was going to let this go but after reading below, "my opinion is that Mr Sherwin is talking BS", I just had to point out your hypocrisy. Without any knowledge of the 'paper' I said for good reason that it looks like they carried out training matches against SF. And low and behold there is good evidence of that in the 'paper'. I also said that they probably used a human games database. And they did. But to your fancy and nothing else of substance you prefer to believe the "100" game matches on each "human opening" was after the main match. What purpose would those matches serve if they were after the main match. I can give a purpose for those matches if they were before the main match. It would be for training. That is unless you fancy that they turned off the learning for those matches. So when applying a BS meter to what we both have said your BS reading is much higher. And that is your hypocrisy. Say what you want but I am done.

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 6:49 pm

corres wrote:
Michael Sherwin wrote:
I did not read any white papers on A0. I only read some reports by journalist. All I was trying to do was demystify somewhat the phenomenon that is A0.

Nothing else read white papers so you do it well!
Thanks for it.

Thanks Robert, I appreciate the understanding!

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 6:52 pm

They only used the database after the fact to determine which openings were most commonly played by humans and to test the training results therein:

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 7:36 pm

zenpawn wrote:They only used the database after the fact to determine which openings were most commonly played by humans and to test the training results therein:

"Finally, we analysed the chess knowledge discovered by AlphaZero. Table 2 analyses the most common human openings (those played more than 100,000 times in an online database of human chess games). Each of these openings is independently discovered and played frequently by AlphaZero during self-play training. When starting from each human opening, AlphaZero convincingly defeated Stockfish, suggesting that it has indeed mastered a wide spectrum of chess play."

I'm sorry Erin but that does not pass the smell test. This must have been before the main match or it lacks continuity of purpose. If it was before the main match do you believe they disabled learning for those pre matches? And why test against every position in the database that met the 100,000 number condition. Testing against a subset of those positions would tell them if they were successful or not. The most logical reason would be to be fully prepared for the main match with SF. What good would MC training do if A0 ended up in a line not stumbled upon by MC in self play from "zero"? The answer is they covered that problem with the most common moves in a human database. And they played those positions against SF in 100 game matches. So I repeat my question, do you think that they turned learning off for those pre matches? You can chose to be logical in this analysis or you can chose illogic. I simply said in my original post that it looks like they did certain things and the evidence uncovered backs that up. I'm not declaring it as an absolute. I am however sticking to my original declaration that it LOOKS LIKE they did certain things. And with that I'm done with this topic. Thanks to both sides for participating.

zenpawn · Post by **zenpawn** » Sat Dec 09, 2017 7:40 pm

Michael Sherwin wrote:You can chose to be logical in this analysis or you can chose illogic.

This jab was not necessary nor appreciated.

Michael Sherwin wrote:I simply said in my original post that it looks like they did certain things and the evidence uncovered backs that up.

I don't think it does. Sorry.

Tobber · Post by **Tobber** » Sat Dec 09, 2017 9:05 pm

zenpawn wrote:
Michael Sherwin wrote:You can chose to be logical in this analysis or you can chose illogic.
This jab was not necessary nor appreciated.

Michael Sherwin wrote:I simply said in my original post that it looks like they did certain things and the evidence uncovered backs that up.
I don't think it does. Sorry.

The published paper very clearly states that the method of learning was self-play and self-play only. I can see no logic at all in assuming they learned from the 1200 games with "human openings". I can't see that Google is really interested in chess at all, they are most likely interested in proof of concept. They have made a similar test with Go, where they started the learning process from human games but continued with self-play when AlphaGo became to strong compared to the human games.
They must sooner or later publish more details and first giving some information and later admitting they cheated is of course not in their interest. It is the method used that's interesting, this Go/Chess thing is just a step towards more profitable software.
Defeating an amateur made chess program with hardware no chess player can afford is not important enough to make it worth cheating, they could easily have done that by putting their resources on developing traditional chess software. It's the method that's important, and the proof that it's working.

/John

CheckersGuy · Post by **CheckersGuy** » Sat Dec 09, 2017 10:37 pm

What one shouldn't forget is that the AlphaZero version isn't just for chess but also for GO and shogi (chinese chess).
In the paper they mention, that they could (probably) implement some domain specific knowledge to make the engine even stronger. For Go that has been symetries in the previous versions and one can probably come up with some domain specific ideas for chess that either speed up the training process or make the engine play stronger.

Rodolfo Leoni · Post by **Rodolfo Leoni** » Sun Dec 10, 2017 9:50 am

Michael Sherwin wrote:
Rodolfo Leoni wrote:
Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you .

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me!

But against Crafty that specific position was.... startposition!

Michael Sherwin · Post by **Michael Sherwin** » Sun Dec 10, 2017 2:12 pm

Rodolfo Leoni wrote:
Michael Sherwin wrote:
Rodolfo Leoni wrote:
Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you .

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me!
But against Crafty that specific position was.... startposition!

So if Romi would have trained against Crafty 100 games in every position that was in a human database 10,000 times or more how do you think Romi would have done against Crafty in a follow up match if Crafty used its tournament book?

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better