I can't believe that so many people don't get it!

Ras · Post by **Ras** » Mon Dec 18, 2017 6:02 pm

Michael Sherwin wrote:It can't store all that data in the NN. It has to be storing w,l,d,p data somewhere either in memory or on a hard drive. And to say NN does not work that way is ridiculous.

https://en.wikipedia.org/wiki/Artificial_neural_network would be a starter to find out how NNs work. They don't work by memorising - that is completely besides the point. In fact, a memorising NN would be unsuited for real tasks because it runs into the problem of overtraining. Happens typically if the NN is too big for the task at hand.

Oh, and there is an even older software that does pretty much the same as AlphaZero, only in a much less complex game. It's Gnu Backgammon which has been doing that approach since at least 2004. Statistical rollouts plus neural net.

hgm · Post by **hgm** » Mon Dec 18, 2017 6:07 pm

Michael Sherwin wrote:"Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning."

The current position info has to be stored in memory somewhere!

OK, but that is during one search. You play a number of random games, and when you get odes with enough visits close to the root, you start using the WDL statistics to concentrate on the more promising ones. The training games simulated this doing only 800 games, but leaning heavily on the advice of the NN to select the moves of these games. (Which initially must have been not better than random.) After the 800 games, a root move with good statistics was played.

The most of the tree was deleted from memory, and only the part of the tree behind the played move and the opponent reply was kept as startig point for the ext move. This is similar to keeping your hash from oe move to the ext, ad then overwriting all 'aged' entries in the next search. At the end of a training game, the entire MCTS tree was deleted, and the next training game started from scratch.

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 6:26 pm

Michael Sherwin wrote:
vvarkey wrote:
There is evidence that they keep a learn file with wins, losses, draws and a percentage chance of winning.
where is this evidence?
My eyes are shot. This all I could find.

"Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning."

The current position info has to be stored in memory somewhere!

This is for all the detractors above. In A0 there is MCTS and NN so this is a hybrid system. It is not just NN and not just MCTS. It is NN and MCTS melded together. Pure MCTS is random but each random simulation is overlaid onto a tree structure. This accumulated info is then used to produce a move I guess from some probability calculation (reinforcement learning) and by itself it sucks, lol. So meld NN with that to direct the MCTS search in a more intelligent manner and it becomes something that does not suck. But then it is not pure MCTS anymore is it? Daniel above just does not get it. It is more than position learning because RomiChess back propagates values down toward the root reinforcing whatever moves Romi does better at. That is beneficial for Romi's results no matter whom Romi plays next. Daniel refuses to hear that Romi gained 50 elo every 5,000 games even though Romi was playing several top engines that were using a truly humongous book. Daniel refuses to understand that Romi gained two classes at WBEC and was about to gain a third. Romi not only played against different openings but against mostly different engines in each class and yet Romi continued to climb. Daniel says he had that learning before RomiChess but does not have it anymore. Well, Daniel, prove it. The detractors are saying all kind of things from limited info and declaring I'm wrong. I'm just following where the evidence leads. And the more evidence that surfaces the more correct I appear. So Daniel, when you really understand how RomiChess learning works why don't you get back to me. Until then all I can tell you is that you are clueless. And yes, I independently invented it whether it existed in some form prior or not. The inspiration for it was pavlov's dog experiments because what he was doing with them was reinforcement learning. And I adapted it directly to computer chess without any other knowledge than that. And Romi's MSMD learning was taken from monkeys on an island by Japan learning to wash their potatoes by watching the researchers do the same. So Daniel I don't care what you think because you are simply wrong. And you are rude about it!

Daniel Shawul · Post by **Daniel Shawul** » Mon Dec 18, 2017 6:43 pm

Michael Sherwin wrote:
Michael Sherwin wrote:
vvarkey wrote:
There is evidence that they keep a learn file with wins, losses, draws and a percentage chance of winning.
where is this evidence?
My eyes are shot. This all I could find.

"Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning."

The current position info has to be stored in memory somewhere!
This is for all the detractors above. In A0 there is MCTS and NN so this is a hybrid system. It is not just NN and not just MCTS. It is NN and MCTS melded together. Pure MCTS is random but each random simulation is overlaid onto a tree structure. This accumulated info is then used to produce a move I guess from some probability calculation (reinforcement learning) and by itself it sucks, lol. So meld NN with that to direct the MCTS search in a more intelligent manner and it becomes something that does not suck. But then it is not pure MCTS anymore is it? Daniel above just does not get it. It is more than position learning because RomiChess back propagates values down toward the root reinforcing whatever moves Romi does better at. That is beneficial for Romi's results no matter whom Romi plays next. Daniel refuses to hear that Romi gained 50 elo every 5,000 games even though Romi was playing several top engines that were using a truly humongous book. Daniel refuses to understand that Romi gained two classes at WBEC and was about to gain a third. Romi not only played against different openings but against mostly different engines in each class and yet Romi continued to climb. Daniel says he had that learning before RomiChess but does not have it anymore. Well, Daniel, prove it. The detractors are saying all kind of things from limited info and declaring I'm wrong. I'm just following where the evidence leads. And the more evidence that surfaces the more correct I appear. So Daniel, when you really understand how RomiChess learning works why don't you get back to me. Until then all I can tell you is that you are clueless. And yes, I independently invented it whether it existed in some form prior or not. The inspiration for it was pavlov's dog experiments because what he was doing with them was reinforcement learning. And I adapted it directly to computer chess without any other knowledge than that. And Romi's MSMD learning was taken from monkeys on an island by Japan learning to wash their potatoes by watching the researchers do the same. So Daniel I don't care what you think because you are simply wrong. And you are rude about it!

I count what 20 Daniels in your trolling post.

Lets disect what you have achieved according to CPW.

=========================================================
RomiChess is famous for its learning approach [2]

Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.

I guess that fact that you replay them 180 plies must be the real invention. I never thought of that.

Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.

Ok, so we are now loading the data in the hash table not the book, clever!
==========================================

Daniel

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 6:43 pm

hgm wrote:
Michael Sherwin wrote:"Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning."

The current position info has to be stored in memory somewhere!
OK, but that is during one search. You play a number of random games, and when you get odes with enough visits close to the root, you start using the WDL statistics to concentrate on the more promising ones. The training games simulated this doing only 800 games, but leaning heavily on the advice of the NN to select the moves of these games. (Which initially must have been not better than random.) After the 800 games, a root move with good statistics was played.

The most of the tree was deleted from memory, and only the part of the tree behind the played move and the opponent reply was kept as startig point for the ext move. This is similar to keeping your hash from oe move to the ext, ad then overwriting all 'aged' entries in the next search. At the end of a training game, the entire MCTS tree was deleted, and the next training game started from scratch.

" At the end of a training game, the entire MCTS tree was deleted, and the next training game started from scratch."

How do you know that? And I can tell you that it makes no sense. It is far superior to maintain the entire tree and use it over and over. The NN would become saturated and start losing valuable data. Also the reinforcement learning in the tree would be thin if deleted every game. There is no way that A0 could have that result against SF with out a lot of deep retained learning. And that can't all be held in the NN. It would have to retain all the data from all the training games to do what it did. Besides, if every game was lost and every game changed the NN then the NN would end up based on the very last few games and all the first games would lose all effect in the NN and would have been worthless. I'm sorry but no way is your explanation correct, IMHO.

Michael Sherwin · Post by **Michael Sherwin** » Mon Dec 18, 2017 6:55 pm

Daniel Shawul wrote:
Michael Sherwin wrote:
Michael Sherwin wrote:
vvarkey wrote:
There is evidence that they keep a learn file with wins, losses, draws and a percentage chance of winning.
where is this evidence?
My eyes are shot. This all I could find.

"Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning."

The current position info has to be stored in memory somewhere!
This is for all the detractors above. In A0 there is MCTS and NN so this is a hybrid system. It is not just NN and not just MCTS. It is NN and MCTS melded together. Pure MCTS is random but each random simulation is overlaid onto a tree structure. This accumulated info is then used to produce a move I guess from some probability calculation (reinforcement learning) and by itself it sucks, lol. So meld NN with that to direct the MCTS search in a more intelligent manner and it becomes something that does not suck. But then it is not pure MCTS anymore is it? Daniel above just does not get it. It is more than position learning because RomiChess back propagates values down toward the root reinforcing whatever moves Romi does better at. That is beneficial for Romi's results no matter whom Romi plays next. Daniel refuses to hear that Romi gained 50 elo every 5,000 games even though Romi was playing several top engines that were using a truly humongous book. Daniel refuses to understand that Romi gained two classes at WBEC and was about to gain a third. Romi not only played against different openings but against mostly different engines in each class and yet Romi continued to climb. Daniel says he had that learning before RomiChess but does not have it anymore. Well, Daniel, prove it. The detractors are saying all kind of things from limited info and declaring I'm wrong. I'm just following where the evidence leads. And the more evidence that surfaces the more correct I appear. So Daniel, when you really understand how RomiChess learning works why don't you get back to me. Until then all I can tell you is that you are clueless. And yes, I independently invented it whether it existed in some form prior or not. The inspiration for it was pavlov's dog experiments because what he was doing with them was reinforcement learning. And I adapted it directly to computer chess without any other knowledge than that. And Romi's MSMD learning was taken from monkeys on an island by Japan learning to wash their potatoes by watching the researchers do the same. So Daniel I don't care what you think because you are simply wrong. And you are rude about it!
I count what 20 Daniels in your trolling post.

Lets disect what you have achieved according to CPW.

=========================================================
RomiChess is famous for its learning approach [2]

Monkey see Monkey do. Romi remembers and incorporates winning lines regardless of which side played the moves into the opening book and can play them back instantly up to 180 ply if the stats for that line remain good.
I guess that fact that you replay them 180 plies must be the real invention. I never thought of that.

Pavlov's dog experiments adapted to computer chess. Each sides moves are given a slight bonus if that side has won and the other sides moves are given a slight penalty. So, good moves can get a slight penalty and bad moves can get a slight bonus, however, through time those are corrected. These bonus/penalties are loaded into the hash table before each move by the computer. If Romi is loosing game after game then this will cause Romi to 'fish' for better moves to play until Romi starts to win.
Ok, so we are now loading the data in the hash table not the book, clever!
==========================================

Daniel

Not trolling just frustration because I replied to you before but you did not hear. And while the quotes above are accurate they are not complete. There is more to it than what is in those quotes. So thanks for trying to understand better. My eyes are totally shot right now. Lets talk more later.

hgm · Post by **hgm** » Mon Dec 18, 2017 7:34 pm

Michael Sherwin wrote:How do you know that?

It is what the paper says. Only keep the sub-tree after the moves played in the game. If you start a new game the tree from the last move of the previous game is entirely useless, as the initial position is not in there. And there were no moves before the beginning from which you could take a sub-tree.

And I can tell you that it makes no sense. It is far superior to maintain the entire tree and use it over and over.

It makes perfect sense to me. Because the purpose of the training games is to measure the quality of the NN response, and how you have to tweek it afterwards such that it starts to prefer moves that are good for winning. Not to play good games despite of a sucking NN, based on statistics. Because that would mask the failures of the NN to a large extent, so that you wouldn't know what to tweek to make it better.

The NN would become saturated and start losing valuable data. Also the reinforcement learning in the tree would be thin if deleted every game. There is no way that A0 could have that result against SF with out a lot of deep retained learning.

You cannot possibly know that. They say it is possible and that they did it. Who do you think I should believe?

And that can't all be held in the NN. It would have to retain all the data from all the training games to do what it did.

I have no idea why you think that. All that is needed to beat Stockfish is play better Chess. Stockfish at longer TC would be able to convicingly beat Stockfish at at faster TC (time odds), without the need for having a learn file.

Besides, if every game was lost and every game changed the NN then the NN would end up based on the very last few games and all the first games would lose all effect in the NN and would have been worthless.

Not at all. It depends on the learning-rate parameter, and with so many games to learn from this was definitely set quite low. So effectively each game changes the NN only very little, and it takes very long (i.e. many games) to completely erase the effects from earlier games. And of course it is very good that it completely forgets all the games played when its Elo was still much below from what it eventually gets. The WDL statistics of those is very unreliable, because the quality of play sucks. They were only good for discovering the coursest concepts, like that it is better to have a Queen than a Knight, but the large number of blunders that made it possible to discover this by frequently losing a Queen for a Knight would mask the more subtle evaluation terms with noise. Once the course terms are learned, they will not be forgotten; in absense of evidence to the contrary they would just stay at their optimal value.

syzygy · Post by **syzygy** » Mon Dec 18, 2017 8:01 pm

Michael Sherwin wrote:AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.

I can't believe anyone would believe what you write here.

mhull · Post by **mhull** » Mon Dec 18, 2017 8:21 pm

Michael Sherwin wrote:
hgm wrote:
Michael Sherwin wrote:"Monte Carlo search does not use a tradition eval as we know it, so mobility, king safety etc. are irrelevant.

It uses a struct to hold info likes wins, losses, draws, win %, etc.,
then simply references accumulated data for the current position to select the move with the highest probability of winning."

The current position info has to be stored in memory somewhere!
OK, but that is during one search. You play a number of random games, and when you get odes with enough visits close to the root, you start using the WDL statistics to concentrate on the more promising ones. The training games simulated this doing only 800 games, but leaning heavily on the advice of the NN to select the moves of these games. (Which initially must have been not better than random.) After the 800 games, a root move with good statistics was played.

The most of the tree was deleted from memory, and only the part of the tree behind the played move and the opponent reply was kept as startig point for the ext move. This is similar to keeping your hash from oe move to the ext, ad then overwriting all 'aged' entries in the next search. At the end of a training game, the entire MCTS tree was deleted, and the next training game started from scratch.
" At the end of a training game, the entire MCTS tree was deleted, and the next training game started from scratch."

How do you know that? And I can tell you that it makes no sense. It is far superior to maintain the entire tree and use it over and over. The NN would become saturated and start losing valuable data. Also the reinforcement learning in the tree would be thin if deleted every game. There is no way that A0 could have that result against SF with out a lot of deep retained learning. And that can't all be held in the NN. It would have to retain all the data from all the training games to do what it did. Besides, if every game was lost and every game changed the NN then the NN would end up based on the very last few games and all the first games would lose all effect in the NN and would have been worthless. I'm sorry but no way is your explanation correct, IMHO.

There is no issue of saturation here. Your idea about it is lacking a basic knowledge of NNs. They don't store inputs or keep track of anything. All they are is a matrix of weightings that can be modified by inputs (trained), but they don't store raw data. The matrix can be queried with inputs and return outputs which reflect it's "impression" of the input data. In this case, the input is a representation of a chess position (or perhaps a sequence of positions). All of the decision-making is done by the "consulting" process (MCTS in this case). If any process is storing and processing raw data, it's not the NN array that's doing it. An NN might get "saturated" during training but not during a game. NN "saturation" isn't really saturation but an undersizing that results in its never converging on the training data. By game time, the NN has been sized to its training data already.

Rodolfo Leoni · Post by **Rodolfo Leoni** » Mon Dec 18, 2017 8:34 pm

syzygy wrote:
Michael Sherwin wrote:AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.
I can't believe anyone would believe what you write here.

I ran these tests vs. Rybka 11 years ago, and I've my long time of experience about many learning system. Conceptually, Mike is right. Different hardware, of course different software, but the same philosophy. Within 4 hours AlphaZ learned by self playing 76 millions games, which is impossible with conventional hardwares. That learning was not about scores, trees or similar, but it was about getting more and more experience for using it later. As this conceptual experience got somehow backward propagated up to startposition, the system is the same of Romichess. Propagation was only about different things.

About TD-Lambda, KnightCap never worked as reinforcement learning. I once tried a match Romi- KnightCap, but I stopped it when it was about 50-0.

@ Mike: I really think you had an excellent idea about building a dedicated GUI. If you encode Romi learning into the GUI, you can have a reinforcement learning Stockfish or similar.

I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!

Re: I can't believe that so many people don't get it!