AlphaGo Zero And AlphaZero, RomiChess done better

Michael Sherwin · Post by **Michael Sherwin** » Thu Dec 07, 2017 11:22 pm

Steve Maughan wrote:I remember the experiments at the time. Could you briefly explain what you did? From memory I recall you did the following:

At the end of the game you parsed the list of moves and adjusted the score up or down a certain number of centipawns based on the outcome. You then hashed each position and stored it in a learning file. I assume this is then loaded into the hash table at the start of each game. Is this broadly correct?

Thanks,

Steve

Yes, that is very accurate except the moves were not hashed in the learn file. The learn file was just a giant tree data structure connected with sibling and descendant pointers. Before the search the entire subtree (if there was one) was loaded into the game hash.

Michael Sherwin · Post by **Michael Sherwin** » Thu Dec 07, 2017 11:58 pm

jdart wrote:It is a neural network based system, and quite a bit has been written about the Go program that preceded it. I do not think it is a big mystery what they did.

Re reinforcement learning, Andrew Tridgell applied this to chess in the late 90's:

https://chessprogramming.wikispaces.com/KnightCap

https://www.cs.princeton.edu/courses/ar ... ess-RL.pdf

He got good learning progress but not great results in terms of final program strength.

--Jon

So I was not the first. Like Bob said, there is nothing new under the Sun. However, Romi did achive superior results in Leo Dicksman's class tournaments gaining two classes and about to gain a third class before his hard drive crashed and he lost Romi's learn file.

I googled reinforcement learning and found no connection to Pavlov's dog experiments in which he rewarded correct behavior and punished wrong behavior, except when Romi is mentioned.

My goal was to create computer chess learning that mimicked how humans learn. So I took two examples of that and adapted them for computer chess. Humans copy moves and that is, monkey see monkey do learning, that gets it name from monkeys watching humans wash potatoes in a stream and then doing it themselves. The second one (reinforcement learning) is just staying with what is working or trying something else if it is not. I think somewhere in what I did was some originality?

corres · Post by **corres** » Fri Dec 08, 2017 1:15 am

I have some question to you:
How much gigabyte was the learning file of Romi and how much Elo had it at that tournament?
Thanks
Robert

Michael Sherwin · Post by **Michael Sherwin** » Fri Dec 08, 2017 8:18 am

corres wrote:I have some question to you:
How much gigabyte was the learning file of Romi and how much Elo had it at that tournament?
Thanks
Robert

A million games produced IIRC an 800 megabyte file. Romi P3k started off in Leo's tournaments about 2200 and then finished higher but that was too long ago for me to remember.

If you need accurate numbers for the learn file usage you can download RomiChess here.

http://kirill-kryukov.com/chess/discuss ... p?id=40457

Unzip it and put a pgn or several into the same directory, start RomiChess and then type "merge 'name'.pgn" As far as elo gain goes a test showed that against a humongous opening book Romi gained about 50 elo per 5,000 games.

Werner · Post by **Werner** » Fri Dec 08, 2017 1:48 pm

Hi Mike,
so if I include all CEGT games into the Folder, merge them and repeat a match played now I will get much better result?

- original games are played without learning
- now using learning on and repeat a match

I will try : but I do not think the engines uses this lear file.
Settings here
learn_on
book_off
quit

I do not know how to use the command in the help.txt inside Romi:
douselearn ?

best wishes

Werner

corres · Post by **corres** » Fri Dec 08, 2017 3:40 pm

Thanks for the answers.
As I see you are a little skeptical about publicized results and processes of AlphaZero.
Do I think well?
Bests.
Robert

Rodolfo Leoni · Post by **Rodolfo Leoni** » Fri Dec 08, 2017 11:34 pm

Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!

Hi Mike,

It's always a pleasure to see you .

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.

Michael Sherwin · Post by **Michael Sherwin** » Fri Dec 08, 2017 11:44 pm

Werner wrote:Hi Mike,
so if I include all CEGT games into the Folder, merge them and repeat a match played now I will get much better result?

- original games are played without learning
- now using learning on and repeat a match

I will try : but I do not think the engines uses this lear file.
Settings here
learn_on
book_off
quit

I do not know how to use the command in the help.txt inside Romi:
douselearn ?

best wishes

Werner

Hi Werner,

To have the learning fully enabled type learn_on then book_on then quit. Learning is part of the book structure. If learning was on all this time then the learn.dat file should be quite large by now. Just loading pgn files will give Romi a better result but for a best result Romi needs many games to personally have experience with the lines and start to select lines that are better for Romi.

Michael Sherwin · Post by **Michael Sherwin** » Fri Dec 08, 2017 11:57 pm

Rodolfo Leoni wrote:
Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you .

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.

Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me!

Michael Sherwin · Post by **Michael Sherwin** » Sat Dec 09, 2017 12:07 am

corres wrote:Thanks for the answers.
As I see you are a little skeptical about publicized results and processes of AlphaZero.
Do I think well?
Bests.
Robert

I am most skeptical about interview style reporting. Reporters will often take a few thin facts and weave a whole story around it that has lots of inaccuracies and often has just outright made up garbage.

I am less skeptical about papers written directly by the authors. Still I'm a bit skeptical because they do not always tell all.

The reason I started this thread is because I looked at the games and the moves by Alpha0 had the same learned feel to them that Romi's moves have when Romi would win against a superior oppenent.

AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Re: AlphaGo Zero And AlphaZero, RomiChess done better