Is AlphaGo approach unsuitable to chess?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

noobpwnftw
Posts: 560
Joined: Sun Nov 08, 2015 11:10 pm

Re: Is AlphaGo approach unsuitable to chess?

Post by noobpwnftw »

If that is anywhere comparable to top-tier chess engines, they probably have done that already, given the amount of quality games available for training and all that. Eventually what they want is people migrating to TPUs, not beating pro Go players, I assume.
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Is AlphaGo approach unsuitable to chess?

Post by brianr »

Looked again and the move sorting NN was removed around Oct'16, although the eval NN was improved.

See:
http://www.talkchess.com/forum/viewtopic.php?t=61808
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Is AlphaGo approach unsuitable to chess?

Post by petero2 »

lucasart wrote:
melajara wrote:AFAIK, after Giraffe and the fact that the author is now a Deepmind employee, nobody followed this approach in chess, why so?
AFAIK Giraffe is an alpha/beta negascout search engine, like any other. Uses all the same techniques, such as null move, search reductions, quiescent search, etc. That's where almost all of Giraffe's elo is.
I am not so sure about that. The extension/reduction stuff seems to be non-standard as described in the Time assignment to children post.
lucasart wrote:The only difference is that the author used neural networks for the evaluation. That is something entirely different from replacing the search with NN (which is totally hopeless for chess).

I'd say that NN have a negative elo contribution to Giraffe. Replace that with a normal eval, properly tuned, and Giraffe would likely be much stronger.
I created a modified version of Giraffe that uses the texel evaluation function. The source code is here.

I then played some test games using the following programs:

Code: Select all

texel101   : Texel version 1.01 (rated 2788 on CCRL 40/40)
giraffe    : Giraffe latest version from bitbucket.org (earlier version rated 2457 on CCRL 40/40)
giraffe_te : Same Giraffe version but using evaluation from latest texel development version
The result was:

Code: Select all

prog1          tc1  prog2       tc2 elodiff draws  depth1  depth2 nGames
texel101    6+0.06  giraffe 48+0.48      60   10%    11.7   15.4    1488
giraffe    12+0.12  giraffe  6+0.06     146   18%    13.1   11.7    5118
giraffe    24+0.24  giraffe 12+0.12     115   21%    14.4   13.1    3562
giraffe    48+0.48  giraffe 24+0.24     117   22%    15.7   14.3    1718
giraffe_te 24+0.24  giraffe 24+0.24      -3   13%    17.1   14.1   16172
The time control is expressed in base time in seconds + increment per move in seconds.

The following observations can be made:

* Using the texel evaluation function in giraffe has a very small effect on the playing strength, even though it makes giraffe search about 3 ply deeper.

* The draw rate is extremely low. Manual inspection of some games suggests that quite a few games are decided by tactical blunders. This may make it harder to test the quality of the evaluation function.

* Self play with the original giraffe version using successive time control doublings shows that searching longer makes giraffe significantly stronger, so even though there are some tactical blunders the search must do some things right.

* I did not modify the texel evaluation function in any way, except by transforming it using the following formula to convert to giraffe's score range where -10000 means 0% expected score and +10000 means 100% expected score:

Code: Select all

giraffe_eval = 20000/(1+exp(-texel_eval*.00650480288770818))-10000
* Before each call to the texel evaluation function the giraffe board representation is converted to the texel representation. According to profiling data this conversion takes a significant amount of the total evaluation time. Nevertheless giraffe_te is roughly 4 times faster than giraffe in terms of NPS.

* It would probably be interesting to insert the giraffe evaluation function in texel to be able to compare the evaluation functions in an engine that has a more conventional search function.
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Is AlphaGo approach unsuitable to chess?

Post by petero2 »

petero2 wrote:I created a modified version of Giraffe that uses the texel evaluation function. The source code is here.

I then played some test games using the following programs:

Code: Select all

giraffe    : Giraffe latest version from bitbucket.org (earlier version rated 2457 on CCRL 40/40)
giraffe_te : Same Giraffe version but using evaluation from latest texel development version
The result was:

Code: Select all

prog1          tc1  prog2       tc2 elodiff draws  depth1  depth2 nGames
giraffe_te 24+0.24  giraffe 24+0.24      -3   13%    17.1   14.1   16172
The time control is expressed in base time in seconds + increment per move in seconds.

The following observations can be made:

* Using the texel evaluation function in giraffe has a very small effect on the playing strength, even though it makes giraffe search about 3 ply deeper.

* It would probably be interesting to insert the giraffe evaluation function in texel to be able to compare the evaluation functions in an engine that has a more conventional search function.
I have now created a (hacky) version of texel that uses the giraffe evaluation function, available here.

The giraffe evaluation function makes the NPS drop by roughly a factor of 10. By playing games between the original texel version and the modified version, I got the following results:

Code: Select all

prog1        tc1  prog2         tc2 elodiff draws  depth1  depth2 nGames
texel     6+0.06  texel_gi   6+0.06     358    9%    12.0     8.8    438
texel     60+0.6  texel_gi   60+0.6     255   19%    16.1    12.5    406

texel     6+0.06  texel_gi   60+0.6    -118   21%    12.4    12.9   1396
texel      500kN  texel_gi    500kN    -104   23%    14.2    14.4   2066
The last match used a fixed 500000 nodes/move limit.

From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Is AlphaGo approach unsuitable to chess?

Post by petero2 »

petero2 wrote:
petero2 wrote:I created a modified version of Giraffe that uses the texel evaluation function. The source code is here.
I have now created a (hacky) version of texel that uses the giraffe evaluation function, available here.
Unfortunately those links don't seem to work for anyone except me. Here are the correct links:

giraffe_te

texelGi

Alternatively you can start from my public dropbox chess folder and browse from there.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Is AlphaGo approach unsuitable to chess?

Post by lucasart »

petero2 wrote:
petero2 wrote:I created a modified version of Giraffe that uses the texel evaluation function. The source code is here.

I then played some test games using the following programs:

Code: Select all

giraffe    : Giraffe latest version from bitbucket.org (earlier version rated 2457 on CCRL 40/40)
giraffe_te : Same Giraffe version but using evaluation from latest texel development version
The result was:

Code: Select all

prog1          tc1  prog2       tc2 elodiff draws  depth1  depth2 nGames
giraffe_te 24+0.24  giraffe 24+0.24      -3   13%    17.1   14.1   16172
The time control is expressed in base time in seconds + increment per move in seconds.

The following observations can be made:

* Using the texel evaluation function in giraffe has a very small effect on the playing strength, even though it makes giraffe search about 3 ply deeper.

* It would probably be interesting to insert the giraffe evaluation function in texel to be able to compare the evaluation functions in an engine that has a more conventional search function.
I have now created a (hacky) version of texel that uses the giraffe evaluation function, available here.

The giraffe evaluation function makes the NPS drop by roughly a factor of 10. By playing games between the original texel version and the modified version, I got the following results:

Code: Select all

prog1        tc1  prog2         tc2 elodiff draws  depth1  depth2 nGames
texel     6+0.06  texel_gi   6+0.06     358    9%    12.0     8.8    438
texel     60+0.6  texel_gi   60+0.6     255   19%    16.1    12.5    406

texel     6+0.06  texel_gi   60+0.6    -118   21%    12.4    12.9   1396
texel      500kN  texel_gi    500kN    -104   23%    14.2    14.4   2066
The last match used a fixed 500000 nodes/move limit.

From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.
Interesting results.

Perhaps the place where DNN would be useful is the pawn evaluation, rather than the entire eval. That's because you have the pawn hash table to reduce the slowdown by an order of magnitude. Intuitively, the idea of using DNN for analysing pawn patterns seems logical. If they are so good at image recognition, they should be good at evaluating pawn structures.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Is AlphaGo approach unsuitable to chess?

Post by cdani »

lucasart wrote: Perhaps the place where DNN would be useful is the pawn evaluation, rather than the entire eval. That's because you have the pawn hash table to reduce the slowdown by an order of magnitude. Intuitively, the idea of using DNN for analysing pawn patterns seems logical. If they are so good at image recognition, they should be good at evaluating pawn structures.
Will be nice and sure there is something to be won. But in Andscacs I have several improvements over pawn structure evaluation that depend on the other pieces, so at the end to have a good pawn eval you need more info than pawns alone.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: Is AlphaGo approach unsuitable to chess?

Post by jorose »

This is really interesting to me. I find it really interesting that this factor 10 speed handicap seems to result in an Elo delta of 450 points :shock: How did you deal with things like futility and razoring margins?

I wonder it is possible to classify positions on whether one evaluation function or the other is better for a position based on positional features or the margins that need to be reached in the search, eg using Giraffe Eval in PV nodes or positions with fewer than 8 pieces, otherwise Texel eval. Mixing eval functions opens up a whole new can of worms though, since the eval function values need to be comparable.

It would be interesting if we could discover patterns of where Giraffe performs well, I would imagine positions with complex tactics to be tough for the slow eval function.
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Is AlphaGo approach unsuitable to chess?

Post by petero2 »

lucasart wrote:
petero2 wrote:From the results it can be seen that the giraffe evaluation function makes texel around 250-350 elo weaker depending on time control. This is caused by the giraffe evaluation function being very slow. If it was somehow possible to make the giraffe evaluation function run as fast as the texel evaluation function, the giraffe eval version would actually be around 100-120 elo stronger than the texel eval version.

Whether future hardware and software improvements will make it possible to run an ANN evaluator as quickly as a traditional evaluator remains to be seen.
Interesting results.

Perhaps the place where DNN would be useful is the pawn evaluation, rather than the entire eval. That's because you have the pawn hash table to reduce the slowdown by an order of magnitude. Intuitively, the idea of using DNN for analysing pawn patterns seems logical. If they are so good at image recognition, they should be good at evaluating pawn structures.
Possibly. It is worth noting however that texel is 250 elo behind stockfish 8 on ccrl 40/40. If we assume that half of that is caused by inferior evaluation, an estimate would be that a 10x faster giraffe eval function would be about the same strength as the stockfish evaluation function.
petero2
Posts: 684
Joined: Mon Apr 19, 2010 7:07 pm
Location: Sweden
Full name: Peter Osterlund

Re: Is AlphaGo approach unsuitable to chess?

Post by petero2 »

jorose wrote:This is really interesting to me. I find it really interesting that this factor 10 speed handicap seems to result in an Elo delta of 450 points :shock:
Note that these were hyper bullet games though. The time control was 6s+0.06s/move which corresponds to about 0.14 seconds/move on average.

The elo delta for a 10x time increase would be significantly less if the time control was longer.
How did you deal with things like futility and razoring margins?
I did not change any search related things at all. The two evaluation functions may have been more compatible though than two evaluation functions chosen from random chess programs. This is because both the texel and the giraffe evaluation functions have been calibrated against an "estimated score" scale. For texel the formula is:

Code: Select all

expected score = 1/(1+10^(-1.13 * texel_score / 400))
For giraffe the formula is:

Code: Select all

expected score = (10000 + giraffe_score) / 20000
Equating the above and rearranging gives:

Code: Select all

texel_score = -log(20000 / (10000 + giraffe_score) - 1) / log(10) / 1.13 * 400
           ~= -log(20000 / (10000 + giraffe_score) - 1) / .0065048
I use this conversion formula in src/ann/tevaluate.cpp.
It would be interesting if we could discover patterns of where Giraffe performs well, I would imagine positions with complex tactics to be tough for the slow eval function.
Possibly, but it is also possible that the giraffe eval has learned to recognize some simple tactical patterns. I really have no idea what it does.