I can't believe that so many people don't get it!

CheckersGuy · Post by **CheckersGuy** » Mon Dec 18, 2017 9:38 pm

Michael Sherwin wrote:
syzygy wrote:
Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
No, it's not.

Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.

And that's different from what AlphaZero is doing. So whats your point ? AlphaZero does not do book learning but rather learns from experience and can generalize.

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 9:42 pm

syzygy wrote:
Michael Sherwin wrote:
syzygy wrote:
Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
No, it's not.

Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.
And all of that has nothing to do with AlphaZero. So excuse me for not even reading.

No problem, I'd prefer that you did not read!

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 9:44 pm

CheckersGuy wrote:
Michael Sherwin wrote:
syzygy wrote:
Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
No, it's not.

Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.
And that's different from what AlphaZero is doing. So whats your point ? AlphaZero does not do book learning but rather learns from experience and can generalize.

OMG, read it again. I just said that Romi's reinforcement learning is not book learning. IT IS NOT BOOK LEARNING. Why wont you people hear? WHY?

Rebel · Post by **Rebel** » Mon Dec 18, 2017 9:50 pm

Michael Sherwin wrote:
hgm wrote:One question: was Romi Chess in any way learning while it was playing its opponent. Or was it purely learing from self play, and only using the thus learned knowledge against Rybka.

And a second question: the learn file was said to cotain WDL statistics of positions. How would Romi Chess benefit from that information once it got to a position that was not in the file?
I took a short nap and my eyes are working better.

1. Asked and answered several times. But okay once more. Part a) yes in a way because before the search all prior knowledge is loaded into the hash table then the search learns from the data and selects a move. Part b) No self learning was employed against Rybka. However, that is immaterial because Romi's learning opposes Romi's natural evaluation function and causes it to return a different result if Romi is losing.

2. WDL is learned best line only if stats are good and when that ends and there is absolutely no subtree to load into the hash table then Romi at least has played a line up to then that it has performed better at in the past so Romi is still better off at that point than without the learning.

Maybe you remember Mchess 5.0 and how it mated Rebel, Hiarcs and Genius from the opening book. Nice handcrafted work by Sandro Necci. No problem to do it with Reinforcement Learning all automatic.

hgm · Post by **hgm** » Mon Dec 18, 2017 10:00 pm

Michael Sherwin wrote:1. Asked and answered several times. But okay once more. Part a) yes in a way because before the search all prior knowledge is loaded into the hash table then the search learns from the data and selects a move.

Perhaps it is because I simply don't understand your answer. Just storing data in another form and reading it back for using it is not learning. So if that is what you ar doing it would be a "no", and not a "yes, in a way".

Part b) No self learning was employed against Rybka. However, that is immaterial because Romi's learning opposes Romi's natural evaluation function and causes it to return a different result if Romi is losing.

"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken? I don't understand the secod sentence at all, but I don't think it would answer ay question I have anyway.

2. WDL is learned best line only if stats are good and when that ends and there is absolutely no subtree to load into the hash table then Romi at least has played a line up to then that it has performed better at in the past so Romi is still better off at that point than without the learning.

This sounds like it is just an opening book, and when you are out of book, you are out. You say it is no book, but everything that only works up to a point, and then not at all, is by definition an opening book. Whether you first store the book in the hash table doesn't make it less than a book. That is totally different from AlphaZero learning, which learned to play good moves in any position. If it would not have learned that, it would have reverted to a random mover very early in the game.

I don't understand how you can get very deep in the game this way. Even if you play millios of games to learn. Perft 6 (3 moves into the game) is already 119M. OK, not every move is acceptable, but even with just 5 playable moves out of 25 you would only get to move 6 with 100 million games. You can record deeper lines in the book, of course, but there doesn't seem to be any chance you would ever play the same line as Rybka very long, if you were not close in strength to Rybka. (And even then...) Otherwise you would not need the help of the learn file, if you would play all the Rybka moves by yourself.

Ras · Post by **Ras** » Mon Dec 18, 2017 10:01 pm

Michael Sherwin wrote:OMG, read it again. I just said that Romi's reinforcement learning is not book learning. IT IS NOT BOOK LEARNING. Why wont you people hear? WHY?

So, does the learn process of Romi help with positions that Romi has NOT yet enountered, but that show similar patterns?

syzygy · Post by **syzygy** » Mon Dec 18, 2017 10:18 pm

hgm wrote:"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?

Clearly his program simply played thousands of games against a relatively deterministic Rybka until it eventually stumbled upon winning lines.

But I guess you are trying to make him say this...

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 10:28 pm

Ras wrote:
Michael Sherwin wrote:OMG, read it again. I just said that Romi's reinforcement learning is not book learning. IT IS NOT BOOK LEARNING. Why wont you people hear? WHY?
So, does the learn process of Romi help with positions that Romi has NOT yet enountered, but that show similar patterns?

Objection your honor, leading question. Sustained. Asked and answered so many fricken times! Here it is again. Romi stores all its games in a tree structure. Each game when finished is overlaid onto the tree structure. WDL is updated for each move affected. WDL is used in MSMDL. But we are discussing Romi's reinforcement learning (RL). If some engine plays 1.a3 against Romi and Romi never saw that before then no there is no help for Romi from the learn file that game. However, if Romi has seen 1.e4 numerous times and Romi does better with 1. ... c5 instead of 1. ... e5 that the evaluation only search would return then the learned reinforcement values will guide the search to chose 1. ... c5 instead of 1. ... e5. So the reinforcement values cause Romi to pick moves that suit it better that give better results. So yes it can help in positions Romi has never saw before. Are you guys teaming up on me to wear me out by asking the same questions over and over again. Why don't you just read up above? Why don't you hear?

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 10:31 pm

syzygy wrote:
hgm wrote:"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?
Clearly his program simply played thousands of games against a relatively deterministic Rybka until it eventually stumbled upon winning lines.

But I guess you are trying to make him say this...

No, Romi beat deterministic Rybka in 100 games or less. Quit making things up.

syzygy · Post by **syzygy** » Mon Dec 18, 2017 10:32 pm

Michael Sherwin wrote:
syzygy wrote:
hgm wrote:"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?
Clearly his program simply played thousands of games against a relatively deterministic Rybka until it eventually stumbled upon winning lines.

But I guess you are trying to make him say this...
No, Romi beat deterministic Rybka in 100 games or less. Quit making things up.

Read the opening post and ask yourself who is making things up here.

I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!