I can't believe that so many people don't get it!

CheckersGuy · Post by **CheckersGuy** » Tue Dec 19, 2017 12:17 am

I think he's trolling. This thread is just complete bogus

Ras · Post by **Ras** » Tue Dec 19, 2017 12:25 am

Michael Sherwin wrote:The NN guides the search using these values stored in the tree. Nowhere does it state that the tree structure is ever deleted.

Are you even remotely clear about how NNs operate, and what the difference between training phase and application phase is, or why there even are two distinct phases to begin with?

hgm · Post by **hgm** » Tue Dec 19, 2017 12:25 am

Michael Sherwin wrote:Nowhere does it state that the tree structure is ever deleted.

Yes it is. On page 8 of the AlphaGo Zero Nature paper, in the right column under 'Play':

The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.

And in the AlphaZero paper it says:

Unless otherwise specified, the training and search algorithm and parameters are identical to AlphaGo Zero(29).

Michael Sherwin wrote:The WDL values are not directly used to compute the RL value. A bonus is applied to the winning side's moves and a penalty is applied to the losing side's moves. The bonus/penalty is larger towards the leaves and very small at the root. Therefore not directly linked to WDL. The idea is to gently nudge the search into better lines. Wins/losses far from the root position affect the search far less than wins/losses close to the root move. It is just simply a lot more dynamic than WDL.

Well, it definitely is a very clever way to use a book. But it is a book nevertheless. It only offers help in a finite number of positions (and compared to the entire game tree of Chess a very limited number of positions), and no help at all outside that set.

Is a persistent hash a book? Romi's RL is like that but much smarter.

I would consider persistent hash indeed a book. It doesn't have to be an opening book, of course. Fairy-Max has a sort of persistent hash, where you can save individual positions. And I use it for storing some key positions in some checkmates against bare King for Makruk, which it otherwise cannot find at fast TC.

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 19, 2017 12:33 am

Michael Sherwin wrote:
Ras wrote:
Michael Sherwin wrote:Objection your honor, leading question.
I'd say the question hit the nail.

If some engine plays 1.a3 against Romi and Romi never saw that before then no there is no help for Romi from the learn file that game.
This is not learning at the level of Alpha0, then. Not even remotely.

However, if Romi has seen 1.e4 numerous times and Romi does better with 1. ... c5 instead of 1. ... e5 that the evaluation only search would return then the learned reinforcement values will guide the search to chose 1. ... c5 instead of 1. ... e5.
This is a fancy sort of book learning. Obviously, also after the "actual" book, but it's still a sort of book learning. This answer is also obvious because you do it via the hash tables which by definition only apply for exact position matches and not for patterns. That's how books work, however.
"MCTS may be viewed as a self-play algorithm that, given neural network parameters θ and
a root position s, computes a vector of search probabilities recommending moves to play, π =
αθ(s), proportional to the exponentiated visit count for each move, πa ∝ N(s, a)
1/τ , where τ is a
temperature parameter."

In other words proportional to the exponentiated visit count for each move means a tree structure in which is stored the probability value. The NN guides the search using these values stored in the tree. Nowhere does it state that the tree structure is ever deleted. The temperature value has probably to do with distance to the leaves. They hide the details in careful scientific-ease so the average Joe has no chance of understanding it. The paper is truly a sculptured work.

"Our results comprehensively demonstrate that a pure reinforcement learning approach is fully feasible,
even in the most challenging of domains: it is possible to train to superhuman level, without
human examples or guidance, given no knowledge of the domain beyond basic rules."

"a pure reinforcement learning approach"

That was done on 1.6 billion moves all in a tree structure and calculated to be the most likely moves by the NN. It can play an entire game from the reinforcement learning tree structure. They are hiding the details in the scientific-ease. They are trying not to reveal anything especially the truth. It's obvious.

Okay, this paper was on AlphaGo Zero but AG0 is more like A0 than AG. Google paid half a billion for the company--do you really think that they would reveal their secrets. Get real!

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 19, 2017 12:37 am

hgm wrote:
Michael Sherwin wrote:Nowhere does it state that the tree structure is ever deleted.
Yes it is. On page 8 of the AlphaGo Zero Nature paper, in the right column under 'Play':

The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.
And in the AlphaZero paper it says:

Unless otherwise specified, the training and search algorithm and parameters are identical to AlphaGo Zero(29).

That is not crystal clear. Romi only loads the subtree discarding the rest. That does not mean that the rest is trashed. That just means that the hash only loads the subtree. In AG0 they may just be omitting that the whole tree is stored elsewhere.

syzygy · Post by **syzygy** » Tue Dec 19, 2017 12:37 am

Michael Sherwin wrote:That was done on 1.6 billion moves all in a tree structure and calculated to be the most likely moves by the NN. It can play an entire game from the reinforcement learning tree structure. They are hiding the details in the scientific-ease. They are trying not to reveal anything especially the truth. It's obvious.

Okay, this paper was on AlphaGo Zero but AG0 is more like A0 than AG. Google paid half a billion for the company--do you really think that they would reveal their secrets. Get real!

You mean to say that you are the one that does not get it?

Science may scare you, but for many people the papers are very understandable.

The reason you are getting severely criticised here is that you chose to open a topic with a provocative title and outlandish claims. When you do that, you can expect some scrutiny. And then it turns out you don't even understand the papers to begin with...

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 19, 2017 12:47 am

Michael Sherwin wrote:
hgm wrote:
Michael Sherwin wrote:Nowhere does it state that the tree structure is ever deleted.
Yes it is. On page 8 of the AlphaGo Zero Nature paper, in the right column under 'Play':

The search tree is reused at subsequent time-steps: the child node corresponding to the played action becomes the new root node; the subtree below this child is retained along with all its statistics, while the remainder of the tree is discarded.
And in the AlphaZero paper it says:

Unless otherwise specified, the training and search algorithm and parameters are identical to AlphaGo Zero(29).
That is not crystal clear. Romi only loads the subtree discarding the rest. That does not mean that the rest is trashed. That just means that the hash only loads the subtree. In AG0 they may just be omitting that the whole tree is stored elsewhere.

"The search tree is reused at subsequent time-steps:

here's the proof in your own example. The tree is reused i.e. in its complete form at each time step.

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 19, 2017 12:49 am

syzygy wrote:
Michael Sherwin wrote:That was done on 1.6 billion moves all in a tree structure and calculated to be the most likely moves by the NN. It can play an entire game from the reinforcement learning tree structure. They are hiding the details in the scientific-ease. They are trying not to reveal anything especially the truth. It's obvious.

Okay, this paper was on AlphaGo Zero but AG0 is more like A0 than AG. Google paid half a billion for the company--do you really think that they would reveal their secrets. Get real!
You mean to say that you are the one that does not get it?

Science may scare you, but for many people the papers are very understandable.

The reason you are getting severely criticised here is that you chose to open a topic with a provocative title and outlandish claims. When you do that, you can expect some scrutiny. And then it turns out you don't even understand the papers to begin with...

You don't argue intelligently. You just attack with blabber. I thought that you were not going to read anymore?

syzygy · Post by **syzygy** » Tue Dec 19, 2017 12:53 am

Michael Sherwin wrote:You don't argue intelligently. You just attack with blabber.

I don't think you are in a position to make such statements. Really.

Michael Sherwin · Post by **Michael Sherwin** » Tue Dec 19, 2017 1:01 am

syzygy wrote:
Michael Sherwin wrote:You don't argue intelligently. You just attack with blabber.
I don't think you are in a position to make such statements. Really.

And you keep using the same approach against me and you expect a different result? That qualifies as insanity, lol! And is just more blabber.

I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!