mjlef wrote: shrapnel wrote:
Jesse Gersenson wrote:If people want learning in Komodo, let your voice be heard; Mark and Larry are open to feature requests, especially those requested by a lot of people.
So Reinforcement Learning will be introduced by Komodo team (if they are capable of it) only on REQUEST.
Looks like the thrashing stockfish received still hasn't convinced them.
Of course we listen to requests. But I do not think the Romi learning is anything like the learning Google/DeepMind did. They used 5000 special TensorFlow Processing units (TPUs), each costing thousands of dollars. Right now, this is way beyond our resources. Romi learning is most likely simply saving important positions in a persistant hash. In future games, these are reloaded into the main hash table, so the new game, if ir encouters one of these positions, alread has deep search information ofr them. The thing is, this helps not at all if a different lien is played. My old program NOW had this feature, but I did it not so much to make it stronger, but instead to avoid losing lines during tournaments. You can think of it as a self-correcting book, which benefits if the same line it tried on Komodo.
Larry and I often discuss Monte Carlo Tree Search, and are interested in trying this. We have also discussed uses for neural networks. Small nns could be useful in present PCs, but the massive nn used in AlphaGo Zero is currently beyond what we, and most chess engine users can afford.
We listen, and try to add what we think people want. But we do not have endless resources. We can afford to buy roughly 1 new server each year. It is not a matter of being "convinced". We just cannot afford it currently. Hopefully, GPUs in graphics cards will get faster and perhaps nn hardware will get added to future PCs or at least be much less expensive.
Mark Lefler was kind enough to ask me in a pm to error check the above!
But I do not think the Romi learning is anything like the learning Google/DeepMind did.
There is circumstantial evidence that it is similar in nature. Reinforcement learning involves accumulated rewards (and penalties) for every position in a database. It could be a persistent hash like suggested.
Romi learning is most likely simply saving important positions in a persistant hash.
Or the way Romi actually does it in a tree of all played games. This is superior to a persistent hash as only the subtree from the current position is loaded into the game hash. This has the advantage that only useful information ends up in the game hash.
The thing is, this helps not at all if a different lien is played.
Not true. Rewards and penalties are greater near the leaves and over time are back propagated to the root. Since every node is a root to a subtree, every move benefits from backpropagation. This results in a meaningful differentiation resulting in a gradual determination of which move gives better results. So for example at the actual root Romi will settle on say 1.e4 or 1.d4 etc. as being best and will always play that opening move. Just like AlphaZ always played 1.d4.
You can think of it as a self-correcting book, which benefits if the same line it tried on Komodo.
Philosophically I would argue that what RomiChess does has nothing to do with a book. In a book an engine usually can choose randomly between acceptable moves. In RomiChess only the learned best move is played in its Monkey See Monkey Do "book". That is separate from Romi's reinforcement learning. It just so happens that the stored tree of all Romi's games can handle both a "book" and the rewards/penalties for reinforcement learning.
Thanks Mark for inviting me to share some details!