AlphaGo Zero And AlphaZero, RomiChess done better

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Michael Sherwin »

Steve Maughan wrote:I remember the experiments at the time. Could you briefly explain what you did? From memory I recall you did the following:

At the end of the game you parsed the list of moves and adjusted the score up or down a certain number of centipawns based on the outcome. You then hashed each position and stored it in a learning file. I assume this is then loaded into the hash table at the start of each game. Is this broadly correct?

Thanks,

Steve
Yes, that is very accurate except the moves were not hashed in the learn file. The learn file was just a giant tree data structure connected with sibling and descendant pointers. Before the search the entire subtree (if there was one) was loaded into the game hash.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Michael Sherwin »

jdart wrote:It is a neural network based system, and quite a bit has been written about the Go program that preceded it. I do not think it is a big mystery what they did.

Re reinforcement learning, Andrew Tridgell applied this to chess in the late 90's:

https://chessprogramming.wikispaces.com/KnightCap

https://www.cs.princeton.edu/courses/ar ... ess-RL.pdf

He got good learning progress but not great results in terms of final program strength.

--Jon
So I was not the first. Like Bob said, there is nothing new under the Sun. However, Romi did achive superior results in Leo Dicksman's class tournaments gaining two classes and about to gain a third class before his hard drive crashed and he lost Romi's learn file.

I googled reinforcement learning and found no connection to Pavlov's dog experiments in which he rewarded correct behavior and punished wrong behavior, except when Romi is mentioned.

My goal was to create computer chess learning that mimicked how humans learn. So I took two examples of that and adapted them for computer chess. Humans copy moves and that is, monkey see monkey do learning, that gets it name from monkeys watching humans wash potatoes in a stream and then doing it themselves. The second one (reinforcement learning) is just staying with what is working or trying something else if it is not. I think somewhere in what I did was some originality?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by corres »

I have some question to you:
How much gigabyte was the learning file of Romi and how much Elo had it at that tournament?
Thanks
Robert
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Michael Sherwin »

corres wrote:I have some question to you:
How much gigabyte was the learning file of Romi and how much Elo had it at that tournament?
Thanks
Robert
A million games produced IIRC an 800 megabyte file. Romi P3k started off in Leo's tournaments about 2200 and then finished higher but that was too long ago for me to remember.

If you need accurate numbers for the learn file usage you can download RomiChess here.

http://kirill-kryukov.com/chess/discuss ... p?id=40457

Unzip it and put a pgn or several into the same directory, start RomiChess and then type "merge 'name'.pgn" As far as elo gain goes a test showed that against a humongous opening book Romi gained about 50 elo per 5,000 games.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
Werner
Posts: 2871
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Werner »

Hi Mike,
so if I include all CEGT games into the Folder, merge them and repeat a match played now I will get much better result?

- original games are played without learning
- now using learning on and repeat a match

I will try : but I do not think the engines uses this lear file.
Settings here
learn_on
book_off
quit

I do not know how to use the command in the help.txt inside Romi:
douselearn ?


best wishes

Werner
Werner
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by corres »

Thanks for the answers.
As I see you are a little skeptical about publicized results and processes of AlphaZero.
Do I think well?
Bests.
Robert
Rodolfo Leoni
Posts: 545
Joined: Tue Jun 06, 2017 4:49 pm
Location: Italy

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Rodolfo Leoni »

Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you . ;)

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
F.S.I. Chess Teacher
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Michael Sherwin »

Werner wrote:Hi Mike,
so if I include all CEGT games into the Folder, merge them and repeat a match played now I will get much better result?

- original games are played without learning
- now using learning on and repeat a match

I will try : but I do not think the engines uses this lear file.
Settings here
learn_on
book_off
quit

I do not know how to use the command in the help.txt inside Romi:
douselearn ?


best wishes

Werner
Hi Werner,

To have the learning fully enabled type learn_on then book_on then quit. Learning is part of the book structure. If learning was on all this time then the learn.dat file should be quite large by now. Just loading pgn files will give Romi a better result but for a best result Romi needs many games to personally have experience with the lines and start to select lines that are better for Romi.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Michael Sherwin »

Rodolfo Leoni wrote:
Michael Sherwin wrote:In January of 2006 IIRC (not exactly sure) I released RomiChess ver P2a. The new version had learning. It had two types of learning, monkey see monkey do and learning adapted from Pavlov's dog experiments. I did not know it at the time but the second type of learning was called reinforcement learning. I just found out very recently that reinforcement learning was invented for robotics control in 1957 the year that I was born, strange. Anyway, as far as I know I reinvented it and was the first to put reinforcement learning into a chess program. The reason i'm apparently patting myself on the back is rather to let people know that I recognise certain aspects of this AlphaZero phenom. For example, using Glaurung 2.x as a test opponent Romi played 20 matches against Glaurung using the ten Nunn positions. On pass one Romi scored 5% against Glaurung. On the 20th pass Romi scored 95%. That is how powerful the learning is! The moves that Romi learned to beat Glaurung were very distinctive looking. They are learned moves so they are not determined by a natural chess playing evaluation but rather an evaluation tweaked by learned rewards and penalties. Looking at the games between AlphaZero and Stockfish I see the same kind of learned moves. In RomiChess one can start with a new learn.dat file and put millionbase.pgn in the same directory as Romi and type merge millionbase.pgn and Romi will learn from all those games. When reading about AlphaZero there is mostly made up reporting. That is what reporters do. They take one or two known facts and make up a many page article that is mostly bunk. The AlphaZero team has released very little actual info. They released that it uses reinforcement learning and that a database of games were loaded in. Beyond that not much is known. But looking at the games against Stockfish it looks as though AlphaZero either trained against Stockfish before the recorded match or entered a pgn of Stockfish games. Stockfish does have some type of randomness to its moves so it can't be totally dominated like Romi dominated Glaurung that had no randomness. So basically take an engine about as strong as Stockfish and give it reinforcement learning and the result is exactly as expected!
Hi Mike,

It's always a pleasure to see you . ;)

Don't forget the matches Romi-Rybka on a theme variation and Romi-Crafty on full standard games... Romi won all of them on 100 games matches, with empty learning file.
Hi Rodolfo! Yes I remember those experiments. Starting from a new learn file Romi was able to win 100 game matches against both Rybka and Crafty when starting from a specific position. Thanks for reminding me! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: AlphaGo Zero And AlphaZero, RomiChess done better

Post by Michael Sherwin »

corres wrote:Thanks for the answers.
As I see you are a little skeptical about publicized results and processes of AlphaZero.
Do I think well?
Bests.
Robert
I am most skeptical about interview style reporting. Reporters will often take a few thin facts and weave a whole story around it that has lots of inaccuracies and often has just outright made up garbage.

I am less skeptical about papers written directly by the authors. Still I'm a bit skeptical because they do not always tell all.

The reason I started this thread is because I looked at the games and the moves by Alpha0 had the same learned feel to them that Romi's moves have when Romi would win against a superior oppenent.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through