I can't believe that so many people don't get it!

hgm · Post by **hgm** » Sun Dec 24, 2017 11:23 pm

I don't think there is anyone that doesn't believe that learning Stockfish' weaknesses will improve your results against Stockfish. The more interesting question is whether after training against Stockfish Romi would also perform better against other engines (with no further training). Or whether playing games only against itself would improve its results againsr Stockfish.

Remember AlphaZero beat Stockfish with zero training games against it.

Michael Sherwin · Post by **Michael Sherwin** » Sun Dec 24, 2017 11:45 pm

hgm wrote:I don't think there is anyone that doesn't believe that learning Stockfish' weaknesses will improve your results against Stockfish. The more interesting question is whether after training against Stockfish Romi would also perform better against other engines (with no further training). Or whether playing games only against itself would improve its results againsr Stockfish.

Remember AlphaZero beat Stockfish with zero training games against it.

Yes, but 44 million training games versus only 157 for RomiChess and in c66 the closed Berlin there are many sub variations. RomiChess is playing better moves in all those variations to get draws. Those better moves would work against any engine that enters those lines. Even if the original result against the new engine was not so good then Romi will learn the additional lines. Give Romi 44 million games using a very wide but shallow book with only the best lines and RomiChess playing the top 20 engines will surpass them all. RomiChess might not then do as well against engines 21 - 40 if they vary early from Romi's learned lines but that will not be the rule. But the point is not how well Romi can ultimately perform with 44 million games. Romi is just an example. It always was only an example. And an amazing example given that it is only a 2400 elo engine. The point is everything Romi can demonstrate Stockfish can demonstrate way better. And on top of that there is plenty of improvement to Romi's learning algorithm that can be done. Really H.G. you can just not budge a single inch on this issue despite all the evidence I submit? I don't get it!

Pio · Post by **Pio** » Mon Dec 25, 2017 12:15 am

Hi Michael!

I do not think hgm does not think what you have done is great.

The problem is that it will be very hard to train a network to primarily work as an opening database. If it could learn how to work primarily as an opening database it would have to learn not to trust the leaf probabilities, but instead override the leaf probabilities with its internal node probability. For that to work it would have to use lots of nodes in the network as position memory to basically remember and change most of the leaf positions' probabilities from the root just to change the root move. That would generally be bad since it would take lots of capacity from the network, hinder the generalisation and basically say - hey let's trust a couple of games' scores much more than search.

I agree that if you could make a proof of the outcome of chess out of 44 million games the idea could work as well as if you train against a highly deterministic opponent.

BR
Pio

Rebel · Post by **Rebel** » Mon Dec 25, 2017 12:32 am

hgm wrote:Remember AlphaZero beat Stockfish with zero training games against it.

Not so sure.

From the paper:

We trained a separate instance of AlphaZero for each game. Training proceededfor 700,000 steps (mini-batches of size 4,096) starting from randomly initialised parameters,

...

In chess, AlphaZero outperformed Stockfish after just 4 hours (300k steps);

Seems to indicate there were multiple matches.

Rebel · Post by **Rebel** » Mon Dec 25, 2017 12:37 am

Michael Sherwin wrote:
hgm wrote:I don't think there is anyone that doesn't believe that learning Stockfish' weaknesses will improve your results against Stockfish. The more interesting question is whether after training against Stockfish Romi would also perform better against other engines (with no further training). Or whether playing games only against itself would improve its results againsr Stockfish.

Remember AlphaZero beat Stockfish with zero training games against it.
Yes, but 44 million training games versus only 157 for RomiChess and in c66 the closed Berlin there are many sub variations. RomiChess is playing better moves in all those variations to get draws. Those better moves would work against any engine that enters those lines. Even if the original result against the new engine was not so good then Romi will learn the additional lines. Give Romi 44 million games using a very wide but shallow book with only the best lines and RomiChess playing the top 20 engines will surpass them all. RomiChess might not then do as well against engines 21 - 40 if they vary early from Romi's learned lines but that will not be the rule. But the point is not how well Romi can ultimately perform with 44 million games. Romi is just an example. It always was only an example. And an amazing example given that it is only a 2400 elo engine. The point is everything Romi can demonstrate Stockfish can demonstrate way better. And on top of that there is plenty of improvement to Romi's learning algorithm that can be done. Really H.G. you can just not budge a single inch on this issue despite all the evidence I submit? I don't get it!

Of course he gets it, but that would mean they cheated. And you can't know for sure.

hgm · Post by **hgm** » Mon Dec 25, 2017 12:43 am

Rebel wrote:Seems to indicate there were multiple matches.

Yes, there were. They periodically took the Alpha Zero that was training, and let it play a match to measure its strength. But the games from these evaluation matches were not used for training.

Alpha Zero does not learn automatically by playing. Updating the NN parameters is a separate procedure. Which needs different hardware (the gen-2 TPUs).

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 25, 2017 12:53 am

Pio wrote:Hi Michael!

I do not think hgm does not think what you have done is great.

The problem is that it will be very hard to train a network to primarily work as an opening database. If it could learn how to work primarily as an opening database it would have to learn not to trust the leaf probabilities, but instead override the leaf probabilities with its internal node probability. For that to work it would have to use lots of nodes in the network as position memory to basically remember and change most of the leaf positions' probabilities from the root just to change the root move. That would generally be bad since it would take lots of capacity from the network, hinder the generalisation and basically say - hey let's trust a couple of games' scores much more than search.

I agree that if you could make a proof of the outcome of chess out of 44 million games the idea could work as well as if you train against a highly deterministic opponent.

BR
Pio

Hi Pio, I'm not sure I follow all the logic in that. Once Romi is out of her trained book which is not really a book at all since it just plays the learned best move. But that is only half of Romi's learning so when it is out of book then the reinforcement learning takes control. Those are nodes with very small rewards or penalties attached to them. A couple of games is not really enough to affect the search much. However, after a while as lines are visited more often the reinforcement values get larger and start to affect the outcome of the search more strongly. So we are not just talking a couple games. People that criticize Romi's learning never understand the nuances that allow it to work so well. They just imagine a weakness and decide it does not work. But it does work and I've shown so many different test. The same version of Romi has climbed two and 3/4 classes in the WBEC class tournaments (Leo was a great guy) just on her learning despite playing different engines in each class and only a few hundred games. And that is not very deterministic to play in three different classes with all those different engines. And Romi never floundered but continued to climb. Romi was one tournament away from promoting to class B (assuming the pattern held) when Leo lost his hard drive and Romi's learn file. I have a friend (more like a frenemy) in real life that can not give me credit for anything. He pooh poohs everything that I do well. He can't help himself. But when he does something well he is the biggest braggart demanding recognition. I'm not seeking recognition here. I'm just trying to get people to understand that the miracle of AZ is in the 44 million trained games and that any standard alpha/beta engine can benefit from the miracle of reinforcement learning. Sorry for the ramble but there are engine fans out there that are asking for this type of learning and they are being ignored. Don't they at least understand that people will pay money to have that learning? They are too proud of their algorithm to dirty it with learning. That is why they refuse to acknowledge the accomplishment that is RomiChess learning.

hgm · Post by **hgm** » Mon Dec 25, 2017 1:01 am

[quote="Michael Sherwin"Yes, but 44 million training games versus only 157 for RomiChess and in c66 the closed Berlin there are many sub variations.[/quote]
But RomiChess dit not start as a random mover. Isn't it reasonable to assume that a random mover needs many more training games before it can beat Stockfish than a 2400-Elo engine?

RomiChess is playing better moves in all those variations to get draws. Those better moves would work against any engine that enters those lines.

Well, that was the question. How often do other engines enter these nodes?

But the point is not how well Romi can ultimately perform with 44 million games.

That is your point. My point is: how well would it eventually perform against, say, Fruit, when you had trained it a few thousand games against Stockfish?

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 25, 2017 1:05 am

hgm wrote:
Rebel wrote:Seems to indicate there were multiple matches.
Yes, there were. They periodically took the Alpha Zero that was training, and let it play a match to measure its strength. But the games from these evaluation matches were not used for training.

Alpha Zero does not learn automatically by playing. Updating the NN parameters is a separate procedure. Which needs different hardware (the gen-2 TPUs).

Off thread topic. H.G. I have an idea. Joker is a winboard engine is it not? Anyway if it is it would only take a couple hours of your time to duplicate Romi's learning in a special version of Joker and probably improve on it greatly. Then we can have Joker and Romi play thousands of games against each other to train and then play them in some RR with stronger engines, not too strong and see how they do. Then you can post a report on the project and give it an honest appraisal. I would trust you to do that once you actually work with the idea.

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 25, 2017 4:01 am

Michael Sherwin wrote: After 42 games in the first 100 games Romi only had 1 draw. After 42 games of the second 100 Romi has gotten 10 draws. SF 8 1 thread is rated 3422 on the 40/4 index. RomiChess is rated a whopping 2423 elo. So after 142 games of training on the original position Romi in the now last 45 games and 11 draws has performed at 3422 - 330 = 3092 elo. Here are the openings visited.

c15, c10, c49, c68, c66, a55, b58, c66, c48, c66, c66, c66, c66, c66, c66, c68, c65, c61, c65, c66, c66, a56, c66, c65, c66, c66, b51, c66, c66, b51, c66, c66, c65, c65, c65, c68, c65, c68, c66, c65, b51, c66, c66, c66, c65, b09, c41, c41, c66, c66, c66, c66, c63, c66, c66, c02, c66, c41, c46, e87, c66, c45, c26, c66, c65, c41, c66, c65, c65, c66, c84, c65, c41, c65, c65, c84, c41, c66, c66, c66, c41, c41, c66, a56, b51, c66, c41, c66, c66, c41, c66, c65, c65, c66, c66, c65, c66, c66, c66, c66, c66, c66, c15, c65, b51, c65, c66, c68, c66, c65, c66, b52, c65, c65, c65, b31, c84, c66, c65, c66, c66, c66, b51, c66, b51, c66, a43, c68, b44, c66, c66, b36, c65, c66, c65, c66, c66, c66, c66, c66, c65, c41, c66, c66, c61, c66, a44

(Draws ...)
c66 = Closed Berlin . . . . . . . . . .
c65 = Berlin, Anderson
c15 = Winawer, Alekhine .
c10 = French, Rubinstein
c49 = Four knights, Nimzovitch
c68 = Spanish exchange
a55 = Old Indian
b58 = Sicilian, Boleslavsky
c48 = Spanish, Classical
c61 = Spanish, Birds
a56 = Benoni, Czech
b51 = Sicilian, Bb5+ Nc6
b09 = Pirc, Austrian
c41 = Philidor, Berger Variation .
c63 = Spanish, Schliemann
c02 = French, Advance
c46 = Three Knights, Schlechter Variation
e87 = King's Indian, Samisch
c45 = Scotch, Tartakower
c26 = Vienna
c84 = Spanish, Closed Center Attack .
b52 = Sicilian, Bb5+ Bd7
b31 = Sicilian, Rossolimo 3 ... g6
a43 = Old Benoni, Schmidt
b44 = Sicilian, Taimanov
b36 = Maroczy Bind
a44 = Old Benoni, Czech

Now 157 games and 16 draws for 3422 - 315 = 3107 elo performance.

Is anyone that did not believe changing their minds yet?

Can the detractors see what 44 million games of training would do for RomiChess?

The second 100 game match is finished. Romi got 31 draws and no wins. But at only 200 games training that is to be expected against SF as SF is very hard to beat. I will add the additional eco codes at the bottom of the original list.

(Draws ...)
c66 = Closed Berlin . . . . . . . . . . . . . . . . . . . . . . . . . .
c65 = Berlin, Anderson . . .
c15 = Winawer, Alekhine .
c10 = French, Rubinstein
c49 = Four knights, Nimzovitch
c68 = Spanish exchange
a55 = Old Indian
b58 = Sicilian, Boleslavsky
c48 = Spanish, Classical
c61 = Spanish, Birds
a56 = Benoni, Czech
b51 = Sicilian, Bb5+ Nc6
b09 = Pirc, Austrian
c41 = Philidor, Berger Variation .
c63 = Spanish, Schliemann
c02 = French, Advance
c46 = Three Knights, Schlechter Variation
e87 = King's Indian, Samisch
c45 = Scotch, Tartakower
c26 = Vienna
c84 = Spanish, Closed Center Attack .
b52 = Sicilian, Bb5+ Bd7
b31 = Sicilian, Rossolimo 3 ... g6
a43 = Old Benoni, Schmidt
b44 = Sicilian, Taimanov
b36 = Maroczy Bind
a44 = Old Benoni, Czech

c00 = French, KIA reversed
a48 = Neo Kings Indian, London System
c60 = Spanish Cozio
a46 = Indian, London
a47 = Neo King's Indian
a41 = Neo Old Indian

Three draws in the first 100 games.
Twenty eight draws in the second 100 games.

Elo performance in the second 100 games = 3422 - 295 = 3127 elo. So far 33 different different eco codes and numerous sub variations within the most played eco codes. Romi has now started drawing in the c65 eco code.

From 3 draws per 100 games to 28 draws per 100 games is huge against SF which changes its play and therefore is not deterministic. Since 33 elo codes were played and there was variance by both SF and Romi in the most played lines it can be understood that Romi is benefitting by playing stronger moves and better positions even though SF varies its play!

Next 100 games starting now.

I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!