Could AlphaGO's concepts be beneficial to chess?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
reflectionofpower
Posts: 1610
Joined: Fri Mar 01, 2013 5:28 pm
Location: USA

Could AlphaGO's concepts be beneficial to chess?

Post by reflectionofpower »

It would be interesting to see the concept that AlphaGO utilized to learn from master GO games applied to chess. I am not a programmer so I am sure it is not that easy or it has been thought of before.

To have the concept applied to all wild attacking players or whatever style you want and then have it play and learn through all these games and to see the speculative patterns that evolve.
"Without change, something sleeps inside us, and seldom awakens. The sleeper must awaken." (Dune - 1984)

Lonnie
User avatar
reflectionofpower
Posts: 1610
Joined: Fri Mar 01, 2013 5:28 pm
Location: USA

Re: Could AlphaGO's concepts be beneficial to chess?

Post by reflectionofpower »

"Mastering the game of Go with deep neural networks and tree search"

https://vk.com/doc-44016343_437229031?d ... 25d42fbc72

I am going to read it later after my bike ride.
"Without change, something sleeps inside us, and seldom awakens. The sleeper must awaken." (Dune - 1984)

Lonnie
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Could AlphaGO's concepts be beneficial to chess?

Post by jdart »

I have done some experimentation with MMTO (https://www.jair.org/media/4217/live-4217-7792-jair.pdf), which has been used for learning evaluation functions in Shogi. I have an implementation of it here:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

But I have not found this to be as effective for chess as the Texel method. I think the problem is that for Shogi programs have been below the strengh of the top players and so learning from high-level games has been effective. For chess, programs passed the GM play level some time ago, although they still have some blind spots.

--Jon
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Could AlphaGO's concepts be beneficial to chess?

Post by matthewlai »

jdart wrote:I have done some experimentation with MMTO (https://www.jair.org/media/4217/live-4217-7792-jair.pdf), which has been used for learning evaluation functions in Shogi. I have an implementation of it here:

https://github.com/jdart1/arasan-chess/ ... /tuner.cpp.

But I have not found this to be as effective for chess as the Texel method. I think the problem is that for Shogi programs have been below the strengh of the top players and so learning from high-level games has been effective. For chess, programs passed the GM play level some time ago, although they still have some blind spots.

--Jon
Not sure how that's related to AlphaGo, but most of the strength of AG came from deep neural networks trained through self-play.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Could AlphaGO's concepts be beneficial to chess?

Post by matthewlai »

reflectionofpower wrote:It would be interesting to see the concept that AlphaGO utilized to learn from master GO games applied to chess. I am not a programmer so I am sure it is not that easy or it has been thought of before.

To have the concept applied to all wild attacking players or whatever style you want and then have it play and learn through all these games and to see the speculative patterns that evolve.
I was thinking about doing something like that, but the problem is you need a lot of data to train those neural networks, and there's just not enough high level human data for that, especially if you want to narrow it down to certain playing style. There are ways around that (eg. training on all data, then fine-tune using the subset), and it would be interesting to try I think.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Could AlphaGO's concepts be beneficial to chess?

Post by jdart »

According to the paper, the first stage was supervised learning from server games.

--Jon
Rochester
Posts: 55
Joined: Sat Feb 20, 2016 6:11 am

Re: Could AlphaGO's concepts be beneficial to chess?

Post by Rochester »

We can learn the deep believer how make better engine code yes? The trainer datas we copy from stockfish git commitments. Then in the fishtest you can know how much good is every git commitment. So we have the label. Then do the learn. Then the deep believer can write the new good patch also.
Tony P.
Posts: 216
Joined: Sun Jan 22, 2017 8:30 pm
Location: Russia

Re: Could AlphaGO's concepts be beneficial to chess?

Post by Tony P. »

matthewlai wrote:
reflectionofpower wrote:It would be interesting to see the concept that AlphaGO utilized to learn from master GO games applied to chess. I am not a programmer so I am sure it is not that easy or it has been thought of before.

To have the concept applied to all wild attacking players or whatever style you want and then have it play and learn through all these games and to see the speculative patterns that evolve.
I was thinking about doing something like that, but the problem is you need a lot of data to train those neural networks, and there's just not enough high level human data for that, especially if you want to narrow it down to certain playing style. There are ways around that (eg. training on all data, then fine-tune using the subset), and it would be interesting to try I think.
Just a thought (not about imitating a style, but about the use of human games in general; I have no experience at machine learning but I have to master it for my non-chess projects so I've studied it a bit): while human moves are generally weak, the 'terminal states', i.e. humans' (and strong engines') resignations, draw agreements and adjudications in classical time control games (where time losses are very rare) are quite reliable indicators of the positions being totally lost or hopelessly equal. There's no big need to draw a distinction between those positions that are evaluated as +5 by a conventional engine and those evaluated as +9 - all the resigned positions can be given the same value.

As there are millions of games available, I hope that the resigned / drawn / adjudicated positions alone, combined with a ton of EGTB positions, can be enough to bootstrap a reward prediction network.* This approach is a big gamble, though.

Because humans often resign a few moves before a mate or a material loss would happen on the board, it can be promising to run, during the training, low-depth searches (with a fast conventional engine) on the end positions of the sample human games in order to detect their reasons for the resignations / draws which are usually obvious even at a low depth, and add the positions from the PVs of those searches into the training set, as they're all obviously resignable / drawish too. This may make the engine better at distinguishing violent positions that require further calculation from quiet ones on its search horizon.

* This neural network type has been introduced in the UNREAL algorithm (the one with 'auxiliary tasks') by Mnih et al. that I've mentioned previously. This network isn't used for move ('policy') evaluation directly, it only shapes the features of the main move evaluation network. In chess, it would be analogous to human 'tactical pattern recognition', predicting (with no calculation of variations) one's likelihood of making the opponent resign shortly (within N moves) as opposed to just gaining a small advantage. Humans yet seem better than computers at identifying outright how likely a given position is to have a tactic available.

In UNREAL, the composition of the initial training set for the reward prediction network is highly biased toward terminal positions. But this set is then expanded with the 'experience replay buffer', i.e. positions arising from the engine's play, and again, most of the intermediate positions of the games aren't fed to the reward network but all their terminal positions are. This bias doesn't influence the main evaluation network much because the training set for the main network is different - it mostly consists of non-terminal positions like a search tree normally would.