Could AlphaGO's concepts be beneficial to chess?

reflectionofpower · Post by **reflectionofpower** » Thu Dec 08, 2016 9:45 am

It would be interesting to see the concept that AlphaGO utilized to learn from master GO games applied to chess. I am not a programmer so I am sure it is not that easy or it has been thought of before.

To have the concept applied to all wild attacking players or whatever style you want and then have it play and learn through all these games and to see the speculative patterns that evolve.

reflectionofpower · Post by **reflectionofpower** » Thu Dec 08, 2016 12:38 pm

"Mastering the game of Go with deep neural networks and tree search"

https://vk.com/doc-44016343_437229031?d ... 25d42fbc72

I am going to read it later after my bike ride.

jdart · Post by **jdart** » Thu Dec 08, 2016 6:41 pm

I have done some experimentation with MMTO (https://www.jair.org/media/4217/live-4217-7792-jair.pdf), which has been used for learning evaluation functions in Shogi. I have an implementation of it here:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

But I have not found this to be as effective for chess as the Texel method. I think the problem is that for Shogi programs have been below the strengh of the top players and so learning from high-level games has been effective. For chess, programs passed the GM play level some time ago, although they still have some blind spots.

--Jon

matthewlai · Post by **matthewlai** » Thu Dec 08, 2016 8:13 pm

jdart wrote:I have done some experimentation with MMTO (https://www.jair.org/media/4217/live-4217-7792-jair.pdf), which has been used for learning evaluation functions in Shogi. I have an implementation of it here:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

But I have not found this to be as effective for chess as the Texel method. I think the problem is that for Shogi programs have been below the strengh of the top players and so learning from high-level games has been effective. For chess, programs passed the GM play level some time ago, although they still have some blind spots.

--Jon

Not sure how that's related to AlphaGo, but most of the strength of AG came from deep neural networks trained through self-play.

matthewlai · Post by **matthewlai** » Thu Dec 08, 2016 8:15 pm

reflectionofpower wrote:It would be interesting to see the concept that AlphaGO utilized to learn from master GO games applied to chess. I am not a programmer so I am sure it is not that easy or it has been thought of before.

To have the concept applied to all wild attacking players or whatever style you want and then have it play and learn through all these games and to see the speculative patterns that evolve.

I was thinking about doing something like that, but the problem is you need a lot of data to train those neural networks, and there's just not enough high level human data for that, especially if you want to narrow it down to certain playing style. There are ways around that (eg. training on all data, then fine-tune using the subset), and it would be interesting to try I think.

jdart · Post by **jdart** » Thu Dec 08, 2016 8:26 pm

According to the paper, the first stage was supervised learning from server games.

--Jon

Rochester · Post by **Rochester** » Sat Dec 10, 2016 11:27 am

We can learn the deep believer how make better engine code yes? The trainer datas we copy from stockfish git commitments. Then in the fishtest you can know how much good is every git commitment. So we have the label. Then do the learn. Then the deep believer can write the new good patch also.

Tony P. · Post by **Tony P.** » Mon Feb 06, 2017 7:36 pm

matthewlai wrote:
reflectionofpower wrote:It would be interesting to see the concept that AlphaGO utilized to learn from master GO games applied to chess. I am not a programmer so I am sure it is not that easy or it has been thought of before.

To have the concept applied to all wild attacking players or whatever style you want and then have it play and learn through all these games and to see the speculative patterns that evolve.
I was thinking about doing something like that, but the problem is you need a lot of data to train those neural networks, and there's just not enough high level human data for that, especially if you want to narrow it down to certain playing style. There are ways around that (eg. training on all data, then fine-tune using the subset), and it would be interesting to try I think.

Just a thought (not about imitating a style, but about the use of human games in general; I have no experience at machine learning but I have to master it for my non-chess projects so I've studied it a bit): while human moves are generally weak, the 'terminal states', i.e. humans' (and strong engines') resignations, draw agreements and adjudications in classical time control games (where time losses are very rare) are quite reliable indicators of the positions being totally lost or hopelessly equal. There's no big need to draw a distinction between those positions that are evaluated as +5 by a conventional engine and those evaluated as +9 - all the resigned positions can be given the same value.

As there are millions of games available, I hope that the resigned / drawn / adjudicated positions alone, combined with a ton of EGTB positions, can be enough to bootstrap a reward prediction network.* This approach is a big gamble, though.

Because humans often resign a few moves before a mate or a material loss would happen on the board, it can be promising to run, during the training, low-depth searches (with a fast conventional engine) on the end positions of the sample human games in order to detect their reasons for the resignations / draws which are usually obvious even at a low depth, and add the positions from the PVs of those searches into the training set, as they're all obviously resignable / drawish too. This may make the engine better at distinguishing violent positions that require further calculation from quiet ones on its search horizon.

* This neural network type has been introduced in the UNREAL algorithm (the one with 'auxiliary tasks') by Mnih et al. that I've mentioned previously. This network isn't used for move ('policy') evaluation directly, it only shapes the features of the main move evaluation network. In chess, it would be analogous to human 'tactical pattern recognition', predicting (with no calculation of variations) one's likelihood of making the opponent resign shortly (within N moves) as opposed to just gaining a small advantage. Humans yet seem better than computers at identifying outright how likely a given position is to have a tactic available.

In UNREAL, the composition of the initial training set for the reward prediction network is highly biased toward terminal positions. But this set is then expanded with the 'experience replay buffer', i.e. positions arising from the engine's play, and again, most of the intermediate positions of the games aren't fed to the reward network but all their terminal positions are. This bias doesn't influence the main evaluation network much because the training set for the main network is different - it mostly consists of non-terminal positions like a search tree normally would.

Could AlphaGO's concepts be beneficial to chess?

Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?

Re: Could AlphaGO's concepts be beneficial to chess?