And that's different from what AlphaZero is doing. So whats your point ? AlphaZero does not do book learning but rather learns from experience and can generalize.Michael Sherwin wrote:Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.syzygy wrote:No, it's not.Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
I can't believe that so many people don't get it!
Moderators: hgm, Rebel, chrisw
-
- Posts: 273
- Joined: Wed Aug 24, 2016 9:49 pm
Re: I can't believe that so many people don't get it!
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: I can't believe that so many people don't get it!
No problem, I'd prefer that you did not read!syzygy wrote:And all of that has nothing to do with AlphaZero. So excuse me for not even reading.Michael Sherwin wrote:Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.syzygy wrote:No, it's not.Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: I can't believe that so many people don't get it!
OMG, read it again. I just said that Romi's reinforcement learning is not book learning. IT IS NOT BOOK LEARNING. Why wont you people hear? WHY?CheckersGuy wrote:And that's different from what AlphaZero is doing. So whats your point ? AlphaZero does not do book learning but rather learns from experience and can generalize.Michael Sherwin wrote:Except that is not what Romi's reinforcement learning is doing. How many times do I have to explain to people that wont 'hear'. The subtree with the reinforcement values are loaded into the hash table, so sorry no book. The reinforcement values just guide the search better on average. The more experience Romi has with a position the more info loaded into the hash the better Romi plays.syzygy wrote:No, it's not.Rodolfo Leoni wrote:[As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess.
Outbooking a deterministic opponent by repeating openings until a winning line is found is an obvious and ancient technique that has nothing to do with AlphaZero. OliThink used it on FICS more than 20 years ago. Many human players preceded OliThink.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 6991
- Joined: Thu Aug 18, 2011 12:04 pm
Re: I can't believe that so many people don't get it!
Maybe you remember Mchess 5.0 and how it mated Rebel, Hiarcs and Genius from the opening book. Nice handcrafted work by Sandro Necci. No problem to do it with Reinforcement Learning all automatic.Michael Sherwin wrote:I took a short nap and my eyes are working better.hgm wrote:One question: was Romi Chess in any way learning while it was playing its opponent. Or was it purely learing from self play, and only using the thus learned knowledge against Rybka.
And a second question: the learn file was said to cotain WDL statistics of positions. How would Romi Chess benefit from that information once it got to a position that was not in the file?
1. Asked and answered several times. But okay once more. Part a) yes in a way because before the search all prior knowledge is loaded into the hash table then the search learns from the data and selects a move. Part b) No self learning was employed against Rybka. However, that is immaterial because Romi's learning opposes Romi's natural evaluation function and causes it to return a different result if Romi is losing.
2. WDL is learned best line only if stats are good and when that ends and there is absolutely no subtree to load into the hash table then Romi at least has played a line up to then that it has performed better at in the past so Romi is still better off at that point than without the learning.
-
- Posts: 27796
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: I can't believe that so many people don't get it!
Perhaps it is because I simply don't understand your answer. Just storing data in another form and reading it back for using it is not learning. So if that is what you ar doing it would be a "no", and not a "yes, in a way".Michael Sherwin wrote:1. Asked and answered several times. But okay once more. Part a) yes in a way because before the search all prior knowledge is loaded into the hash table then the search learns from the data and selects a move.
"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken? I don't understand the secod sentence at all, but I don't think it would answer ay question I have anyway.Part b) No self learning was employed against Rybka. However, that is immaterial because Romi's learning opposes Romi's natural evaluation function and causes it to return a different result if Romi is losing.
This sounds like it is just an opening book, and when you are out of book, you are out. You say it is no book, but everything that only works up to a point, and then not at all, is by definition an opening book. Whether you first store the book in the hash table doesn't make it less than a book. That is totally different from AlphaZero learning, which learned to play good moves in any position. If it would not have learned that, it would have reverted to a random mover very early in the game.2. WDL is learned best line only if stats are good and when that ends and there is absolutely no subtree to load into the hash table then Romi at least has played a line up to then that it has performed better at in the past so Romi is still better off at that point than without the learning.
I don't understand how you can get very deep in the game this way. Even if you play millios of games to learn. Perft 6 (3 moves into the game) is already 119M. OK, not every move is acceptable, but even with just 5 playable moves out of 25 you would only get to move 6 with 100 million games. You can record deeper lines in the book, of course, but there doesn't seem to be any chance you would ever play the same line as Rybka very long, if you were not close in strength to Rybka. (And even then...) Otherwise you would not need the help of the learn file, if you would play all the Rybka moves by yourself.
Last edited by hgm on Mon Dec 18, 2017 10:06 pm, edited 3 times in total.
-
- Posts: 2487
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: I can't believe that so many people don't get it!
So, does the learn process of Romi help with positions that Romi has NOT yet enountered, but that show similar patterns?Michael Sherwin wrote:OMG, read it again. I just said that Romi's reinforcement learning is not book learning. IT IS NOT BOOK LEARNING. Why wont you people hear? WHY?
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
Re: I can't believe that so many people don't get it!
Clearly his program simply played thousands of games against a relatively deterministic Rybka until it eventually stumbled upon winning lines.hgm wrote:"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?
But I guess you are trying to make him say this...
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: I can't believe that so many people don't get it!
Objection your honor, leading question. Sustained. Asked and answered so many fricken times! Here it is again. Romi stores all its games in a tree structure. Each game when finished is overlaid onto the tree structure. WDL is updated for each move affected. WDL is used in MSMDL. But we are discussing Romi's reinforcement learning (RL). If some engine plays 1.a3 against Romi and Romi never saw that before then no there is no help for Romi from the learn file that game. However, if Romi has seen 1.e4 numerous times and Romi does better with 1. ... c5 instead of 1. ... e5 that the evaluation only search would return then the learned reinforcement values will guide the search to chose 1. ... c5 instead of 1. ... e5. So the reinforcement values cause Romi to pick moves that suit it better that give better results. So yes it can help in positions Romi has never saw before. Are you guys teaming up on me to wear me out by asking the same questions over and over again. Why don't you just read up above? Why don't you hear?Ras wrote:So, does the learn process of Romi help with positions that Romi has NOT yet enountered, but that show similar patterns?Michael Sherwin wrote:OMG, read it again. I just said that Romi's reinforcement learning is not book learning. IT IS NOT BOOK LEARNING. Why wont you people hear? WHY?
Last edited by Michael Sherwin on Mon Dec 18, 2017 10:33 pm, edited 1 time in total.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 3196
- Joined: Fri May 26, 2006 3:00 am
- Location: WY, USA
- Full name: Michael Sherwin
Re: I can't believe that so many people don't get it!
No, Romi beat deterministic Rybka in 100 games or less. Quit making things up.syzygy wrote:Clearly his program simply played thousands of games against a relatively deterministic Rybka until it eventually stumbled upon winning lines.hgm wrote:"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?
But I guess you are trying to make him say this...
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
Re: I can't believe that so many people don't get it!
Read the opening post and ask yourself who is making things up here.Michael Sherwin wrote:No, Romi beat deterministic Rybka in 100 games or less. Quit making things up.syzygy wrote:Clearly his program simply played thousands of games against a relatively deterministic Rybka until it eventually stumbled upon winning lines.hgm wrote:"No self learning was employed". So what learning was employed, then? How did you fill the learn file, who played the games from which the WDL statistics were taken?
But I guess you are trying to make him say this...