AlphaGo Zero And AlphaZero, RomiChess done better

Michael Sherwin · Post by **Michael Sherwin** » Sun Dec 10, 2017 2:12 pm

Rodolfo Leoni wrote:
Michael Sherwin wrote:
Rodolfo Leoni wrote:
Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you .

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me!
But against Crafty that specific position was.... startposition!

So if Romi would have trained against Crafty 100 games in every position that was in a human database 10,000 times or more how do you think Romi would have done against Crafty in a follow up match if Crafty used its tournament book?

Rodolfo Leoni · Post by **Rodolfo Leoni** » Sun Dec 10, 2017 3:32 pm

Michael Sherwin wrote:
Rodolfo Leoni wrote:
Michael Sherwin wrote:
Rodolfo Leoni wrote:
Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you .

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me!
But against Crafty that specific position was.... startposition!
So if Romi would have trained against Crafty 100 games in every position that was in a human database 10,000 times or more how do you think Romi would have done against Crafty in a follow up match if Crafty used its tournament book?

Esay: +100 =0 -0.

We shouldn't forget they were different times for computer chess. On single CPUs (deterministic chess) it's easier to find opponent's weaknesses. With multicore engines it becomes a bit harder because engines often change their PVs. So I guess Romi would win but it'd suffer some lost.

About AlphaZ, I think that's an hardware revolution and engines strenght (or learning) has nothing to do with the result. It's a different way to build a software, a different pattern of evaluation, and a learning which is much more similar to KnightCap than any other. With a difference: at those times, KnightCap learning could never work.

It'd have been far more interesting a match AlphaZ-Stockfish 9 (when released), but if you give SF9 some learning features. Romi style or Critter style, it doesn't matter. We'd have a learning vs. learning in a match with engines of similar level. Or maybe SF9 would have been strong enough to win the match...

We'll never know becaure that was mere marketing so they needed to win... That doesn't mean pruduct is bad. It's probably great, but if you want to sell a great (and expensive) product you need to do a lot of advertising about an unbeliveable preformance. So you spend a lot of money because you want to earn a lot more.

Just two, max three cents.

Ponti · Post by **Ponti** » Sun Dec 10, 2017 6:36 pm

This is not a novelty.

R.J. Fischer did that long time ago.

He learned and memorized all of Spassky´s games. The result everybody knows.

Akababa · Post by **Akababa** » Sun Dec 10, 2017 10:24 pm

If you really went from 5% to 95%, that's obviously overfitting.

Anyone can achieve 100% against a deterministic adversary

Rodolfo Leoni · Post by **Rodolfo Leoni** » Mon Dec 11, 2017 12:15 am

Akababa wrote: Anyone can achieve 100% against a deterministic adversary

That's why Critter (2010) and Stockfish PA GTB (2014) have been the last engines with a structured learning system. The undeterministic behavior of multi-CPU engines made position learning almost useless. Almost. I recently posted a match result of a match SF PA GTB-Asmfish (and I presume there was a huge ELO difference, more than 200 IMO. On a 100 games match SF PA GTB won 52-48 with a theme position and I wrote that one million games learning file would suffice to win a match with standard startposition too. A deep and strongly "pruned" opening book would be needed in that case. But there is an inconvenience: if you change match conditions (e.g. learning was performed 1 sec/move and match is 10 secs/move, or if opponent changes its opening book) then all that position learning becomes useless. That's why Romichess learning system is more effective for matches: in fact, it's not a classic position learning. I'd define it a "book deep learning".

As I said, it's not matter of software. AlphaZ learning couldn't compete with Romi learning on traditional hardware. We can discuss about engines strenght difference, tough. If AlphaZ (empty learning file) is as strong as SF8, Romi couldn't compete with it because of ELO difference. But if somebody gives SF8 the Romi learning system and match it with a rewrite of AlphaZ for windows (trying to keep its learning system) then result could be quite embarassing for Google team...

This is NOT to criticize AlphaZ and their work. It's really a great result and there'll be a new frontier of computer chess science. A pity that frontier will be for few people until hardware prizes will become reasonable.

carldaman · Post by **carldaman** » Mon Dec 11, 2017 1:13 am

Rodolfo Leoni wrote:
Akababa wrote: Anyone can achieve 100% against a deterministic adversary
That's why Critter (2010) and Stockfish PA GTB (2014) have been the last engines with a structured learning system. The undeterministic behavior of multi-CPU engines made position learning almost useless. Almost. I recently posted a match result of a match SF PA GTB-Asmfish (and I presume there was a huge ELO difference, more than 200 IMO. On a 100 games match SF PA GTB won 52-48 with a theme position and I wrote that one million games learning file would suffice to win a match with standard startposition too. A deep and strongly "pruned" opening book would be needed in that case. But there is an inconvenience: if you change match conditions (e.g. learning was performed 1 sec/move and match is 10 secs/move, or if opponent changes its opening book) then all that position learning becomes useless. That's why Romichess learning system is more effective for matches: in fact, it's not a classic position learning. I'd define it a "book deep learning".

As I said, it's not matter of software. AlphaZ learning couldn't compete with Romi learning on traditional hardware. We can discuss about engines strenght difference, tough. If AlphaZ (empty learning file) is as strong as SF8, Romi couldn't compete with it because of ELO difference. But if somebody gives SF8 the Romi learning system and match it with a rewrite of AlphaZ for windows (trying to keep its learning system) then result could be quite embarassing for Google team...

This is NOT to criticize AlphaZ and their work. It's really a great result and there'll be a new frontier of computer chess science. A pity that frontier will be for few people until hardware prizes will become reasonable.

I suggested introducing learning functionality into Komodo several times over the last few years, and all I got was a "we'll consider it" type of answer. I specifically gave Critter and SF_PA_GTB as an example of how it could be done, and that was freeware. They're even on good terms with Jesse and R. Vida so they could reach out for help on that one, if necessary.

It looks like learning has been sorely neglected by both the programming and testing community (since it would distort the other ratings) for many years. However, one could test two instances of the same program, one with the other without learning. One could even remove the learning games from the rating list afterwards, so as to avoid distortions.

It probably can be done without tying up too many resources, since the engines that have learning features are quite few - Critter, Baron, Phalanx, RomiChess, of course, and maybe a few others.

Thanks for bringing it up again, Rodolfo! Learning in chess is always relevant.

Regards,
CL

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 11, 2017 3:58 am

Rodolfo Leoni wrote:
Akababa wrote: Anyone can achieve 100% against a deterministic adversary
That's why Critter (2010) and Stockfish PA GTB (2014) have been the last engines with a structured learning system. The undeterministic behavior of multi-CPU engines made position learning almost useless. Almost. I recently posted a match result of a match SF PA GTB-Asmfish (and I presume there was a huge ELO difference, more than 200 IMO. On a 100 games match SF PA GTB won 52-48 with a theme position and I wrote that one million games learning file would suffice to win a match with standard startposition too. A deep and strongly "pruned" opening book would be needed in that case. But there is an inconvenience: if you change match conditions (e.g. learning was performed 1 sec/move and match is 10 secs/move, or if opponent changes its opening book) then all that position learning becomes useless. That's why Romichess learning system is more effective for matches: in fact, it's not a classic position learning. I'd define it a "book deep learning".

As I said, it's not matter of software. AlphaZ learning couldn't compete with Romi learning on traditional hardware. We can discuss about engines strenght difference, tough. If AlphaZ (empty learning file) is as strong as SF8, Romi couldn't compete with it because of ELO difference. But if somebody gives SF8 the Romi learning system and match it with a rewrite of AlphaZ for windows (trying to keep its learning system) then result could be quite embarassing for Google team...

This is NOT to criticize AlphaZ and their work. It's really a great result and there'll be a new frontier of computer chess science. A pity that frontier will be for few people until hardware prizes will become reasonable.

Hi Rodolfo, I have never once openly disagreed with anything that you have said so please do not get upset with me but I disagree with one point. Romi's learning is a bit more than position learning. When Romi learns nodes higher in the tree are affected the most and change sooner. However, as more results come in the moves at the root get better defined. So for example Romi will choose between 1.e4 and 1.d4 which ever gives Romi a better result. That is true from any node in the tree. That is a permanent gain. It may not help win matches against god engines but it will help Romi gain several classes in strength against her contemporaries as demonstrated in Leo's class tournaments where Romi gained two classes and was about to gain a third. And that was based on just a 100 to 200 played games!

Rodolfo Leoni · Post by **Rodolfo Leoni** » Mon Dec 11, 2017 4:33 pm

Michael Sherwin wrote:
Rodolfo Leoni wrote:
............................ That's why Romichess learning system is more effective for matches: in fact, it's not a classic position learning. I'd define it a "book deep learning".
..................................................
Hi Rodolfo, I have never once openly disagreed with anything that you have said so please do not get upset with me but I disagree with one point. Romi's learning is a bit more than position learning. When Romi learns nodes higher in the tree are affected the most and change sooner. However, as more results come in the moves at the root get better defined. So for example Romi will choose between 1.e4 and 1.d4 which ever gives Romi a better result. That is true from any node in the tree. That is a permanent gain. It may not help win matches against god engines but it will help Romi gain several classes in strength against her contemporaries as demonstrated in Leo's class tournaments where Romi gained two classes and was about to gain a third. And that was based on just a 100 to 200 played games!

Hi Mike, that's not a disagreement at all. That's what I tried to say.

Rodolfo Leoni · Post by **Rodolfo Leoni** » Mon Dec 11, 2017 5:24 pm

carldaman wrote:
I suggested introducing learning functionality into Komodo several times over the last few years, and all I got was a "we'll consider it" type of answer. I specifically gave Critter and SF_PA_GTB as an example of how it could be done, and that was freeware. They're even on good terms with Jesse and R. Vida so they could reach out for help on that one, if necessary.

It looks like learning has been sorely neglected by both the programming and testing community (since it would distort the other ratings) for many years. However, one could test two instances of the same program, one with the other without learning. One could even remove the learning games from the rating list afterwards, so as to avoid distortions.

It probably can be done without tying up too many resources, since the engines that have learning features are quite few - Critter, Baron, Phalanx, RomiChess, of course, and maybe a few others.

Thanks for bringing it up again, Rodolfo! Learning in chess is always relevant.

Regards,
CL

I think several things are going to happen. Google hardware (if cheap enough) will enforce some kind of learning encoding on every programmer who wants to try to stay atop. If there'll be a new StockfishZ, KomodoZ or HoudiniZ then it can't be avoided. But until the new hardware willl be available at reasonable costs I guess nothing will change. I see no logic in introducing a system when still working with the "old" traditional hardware. Programmers would need to rewrite it (and the whole engine IMO) to adapt it to the new hardware.

I think it'll be matters of YEARS. And there's another possible scenario too: this hardware, maybe, will never be available.

One thing is 100% sure: when Google hardware will be available then correspondence chess will die. What's the point in playing games if the learning feature is advanced at a point where it only plays perfect games?

syzygy · Post by **syzygy** » Mon Dec 11, 2017 10:22 pm

Rodolfo Leoni wrote:One thing is 100% sure: when Google hardware will be available then correspondence chess will die. What's the point in playing games if the learning feature is advanced at a point where it only plays perfect games?

There is no reason to think that AlphaZero plays only perfect games. It might not even know about the wrong corner in KBNK. (Does that come up often enough in 44 million games for AlphaZero to figure it out? Maybe not, maybe it does, but even if it does there will be other patterns it won't have seen often enough.)

Show it one of those positions that are easy for humans, that fool all top engines and that never come up in real games, and I guarantee you AlphaZero will not have a clue either.

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better