Google's AlphaGo team has been working on chess

Daniel Shawul · Post by **Daniel Shawul** » Wed Dec 27, 2017 5:38 pm

That is a start for sure -- proving a NN evaluation could be competive or even much better than a hand crafted evaluation function. The latency of evaluating the NN can be countered with a combination hardware (GPU/TPU) and software (async evaluations) which is what Google did for AlphaGo. Giraffe used only three layers of NN with chess specific inputs such as attack maps while AlphaZero used many more layers of CNN with just the rules of the game as input. Texel actually replaced its evaluation function with Giraffe's NN and showed that the eval is actually better but it would need a time odds to be competitive on the same hardware.

I expected that they would use alpha-beta but the fact that they did it with MCTS speeks volumes about the power of their NN (maybe i am wrong here and MCTS could be better than AB as they seem to claim). The fact that AZ started tuning itself from scratch doesn't surpize me one bit because all it does is slow down the convergence. I think they also claimed it could start from learning the rules of the game by itself and become a grandmaster as well -- but it would take even more training time. I am also able to train "scorpioZero" 's evaluation by starting from all parameters set to 0 -- not the same thing as AlphaZero's but still...

Once Remi started asking why is not anyone here trying an MCTS + CNN combo for chess, it raised my suspicion and for sure a week later AlphaZero dropped the bomb.

Daniel

CheckersGuy · Post by **CheckersGuy** » Wed Dec 27, 2017 7:37 pm

Daniel Shawul wrote:That is a start for sure -- proving a NN evaluation could be competive or even much better than a hand crafted evaluation function. The latency of evaluating the NN can be countered with a combination hardware (GPU/TPU) and software (async evaluations) which is what Google did for AlphaGo. Giraffe used only three layers of NN with chess specific inputs such as attack maps while AlphaZero used many more layers of CNN with just the rules of the game as input. Texel actually replaced its evaluation function with Giraffe's NN and showed that the eval is actually better but it would need a time odds to be competitive on the same hardware.

I expected that they would use alpha-beta but the fact that they did it with MCTS speeks volumes about the power of their NN (maybe i am wrong here and MCTS could be better than AB as they seem to claim). The fact that AZ started tuning itself from scratch doesn't surpize me one bit because all it does is slow down the convergence. I think they also claimed it could start from learning the rules of the game by itself and become a grandmaster as well -- but it would take even more training time. I am also able to train "scorpioZero" 's evaluation by starting from all parameters set to 0 -- not the same thing as AlphaZero's but still...

Once Remi started asking why is not anyone here trying an MCTS + CNN combo for chess, it raised my suspicion and for sure a week later AlphaZero dropped the bomb.

Daniel

I think why they used mcts is pretty clear. They wanted a system that can play almost any (board) game by just knowing the rules. Mct's already has the property that you only need to konw the rules of the game. That's why I wonder whether alpha-beta-search+deepMind`s neural network might be better.

bhamadicharef · Post by **bhamadicharef** » Thu Dec 28, 2017 3:14 am

In his NIPS2017 presentation, David Silver mentioned 3 academic
work the mosr recent is

Thinking Fast and Slow with Deep Learning and Tree Search
Thomas Anthony, Zheng Tian, David Barber
https://arxiv.org/abs/1705.08439

Worth a read to further understand AlphaZero

Rebel · Post by **Rebel** » Thu Dec 28, 2017 7:08 am

Daniel Shawul wrote:That is a start for sure -- proving a NN evaluation could be competive or even much better than a hand crafted evaluation function. The latency of evaluating the NN can be countered with a combination hardware (GPU/TPU) and software (async evaluations) which is what Google did for AlphaGo. Giraffe used only three layers of NN with chess specific inputs such as attack maps while AlphaZero used many more layers of CNN with just the rules of the game as input. Texel actually replaced its evaluation function with Giraffe's NN and showed that the eval is actually better but it would need a time odds to be competitive on the same hardware.

Statements like these could make me a believer.

mhull · Post by **mhull** » Thu Dec 28, 2017 7:49 am

Rebel wrote:
Daniel Shawul wrote:That is a start for sure -- proving a NN evaluation could be competive or even much better than a hand crafted evaluation function. The latency of evaluating the NN can be countered with a combination hardware (GPU/TPU) and software (async evaluations) which is what Google did for AlphaGo. Giraffe used only three layers of NN with chess specific inputs such as attack maps while AlphaZero used many more layers of CNN with just the rules of the game as input. Texel actually replaced its evaluation function with Giraffe's NN and showed that the eval is actually better but it would need a time odds to be competitive on the same hardware.
Statements like these could make me a believer.

As for mitigating latency there used to be significant optimizations available on MMX capable chips in the very old Brainmaker NN application (I still have several copies of it):
https://calsci.com/MMX.html

It seems this software is still for sale. At the time (25 years ago) Brainmaker also sold an EISA card ($4,150) that could speed up training and processing by even more.

Rebel · Post by **Rebel** » Thu Dec 28, 2017 8:00 am

It's still sold.

BrainMaker Professional for Windows $795

petero2 · Post by **petero2** » Thu Dec 28, 2017 8:46 am

Rebel wrote:
Daniel Shawul wrote:That is a start for sure -- proving a NN evaluation could be competive or even much better than a hand crafted evaluation function. The latency of evaluating the NN can be countered with a combination hardware (GPU/TPU) and software (async evaluations) which is what Google did for AlphaGo. Giraffe used only three layers of NN with chess specific inputs such as attack maps while AlphaZero used many more layers of CNN with just the rules of the game as input. Texel actually replaced its evaluation function with Giraffe's NN and showed that the eval is actually better but it would need a time odds to be competitive on the same hardware.
Statements like these could make me a believer.

The post describing this test is here.

Michel · Post by **Michel** » Thu Dec 28, 2017 10:05 am

Just in case this isn't widely known, Marcel Van Kervinck did a similar experiment grafting SF's eval onto Crafty's search (by actually making Crafty call the SF binary).

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=30107

In order to purely measure the quality of the eval Marcel used a clever trick to offset the difference in evaluation speed. He made both engines evaluate both evaluation functions and then pick their own. So for both engines evaluation took exactly the same time. This trick could also be used in the Texel-Giraffe experiment so that it is not necessary to translate an NPS difference into an elo differences (which always involves some guessing).

Rebel · Post by **Rebel** » Thu Dec 28, 2017 10:09 am

I am try to train Giraffe exactly how it is described here.

But it crashes, see screen dump.

Anyone with experience? I am using Win7.

Michel · Post by **Michel** » Thu Dec 28, 2017 12:18 pm

Michel wrote:Just in case this isn't widely known, Marcel Van Kervinck did a similar experiment grafting SF's eval onto Crafty's search (by actually making Crafty call the SF binary).

http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=30107

In order to purely measure the quality of the eval Marcel used a clever trick to offset the difference in evaluation speed. He made both engines evaluate both evaluation functions and then pick their own. So for both engines evaluation took exactly the same time. This trick could also be used in the Texel-Giraffe experiment so that it is not necessary to translate an NPS difference into an elo differences (which always involves some guessing).

Other ways of offsetting the differences in evaluation speed are of course fixed depth or fixed nodes searches.

Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess

Re: Google's AlphaGo team has been working on chess