New Giraffe (Aug 28)

Laskos · Post by **Laskos** » Sun Aug 30, 2015 8:26 am

matthewlai wrote:
Laskos wrote: It shows an English symmetrical four knights with 4.e4 and a slightly negative eval (I suppose this means less than 50% performance). My database shows slightly below 50% performance on this too.
Yeah +100.00 is the max positive non-mate score, and -100.00 is the max negative non-mate score.

I find that as training progresses, it usually starts by preferring 1. d4, then eventually switches to 1. e4, then 1. c4 (not counting very beginning where it just plays randomly).

With a7 pawn missing it's about +7.00.
b7 => +12.00
c7 => +13.00
d7 => +13.00
e7 => +13.00
f7 => +20.00
g7 => +18.00
h7 => +11.50

Not sure why it thinks f7 and g7 pawns are so important. Probably because of easier king-side attacks.

More I look at Giraffe, more amazing it seems to me. The pawn handicaps shown by you here look in line with what the best engines like Komodo and Stockfish show in several minutes search. We had some tests (play-outs) determining ELO of handicaps (at some reasonably long time controls), and tests showed that a2+c2 handicap is actually larger ELO-wise than f7 handicap, 560 ELO points versus 510 ELO points. Most strong engines, Fruit, Komodo, Stockfish included, exhibit the opposite eval, f7 as being larger handicap than a2+c2.
Here is the link to our recent discussion with Larry:

http://www.talkchess.com/forum/viewtopi ... 5&start=51

Larry Kaufman wrote:Based on a one minute search on 4 cores of the handicap starting positions, f7 actually shows as more of a handicap than even a2 + c2, 1.32 vs 1.14. Probably this means something isn't quite right about the material vs. positional scoring in Komodo, although it is well-tuned. Something we need to investigate. f7 is quite a large handicap because Black's development must be very modest due to tactical problems, but still a2 + c2 seems larger (and scores as higher in your tests). I think that Komodo's eval tricked me into underestimating the two pawn handicaps.

Giraffe shows the correct eval:
f7: +20.00
a2+c2: -22.00

Then, I tested several opening positions compared to large databases of human games, and Giraffe seems to show an eval more faithful to the database statistics of outcomes compared to top engines like Komodo or Stockfish.

I am also curious about its scaling (ELO strength in time). Conventional engines at first can be said that exhibit a logarithmic scaling behavior, the ELO increase is proportional to number of plies which is basically log(nodes). Then, taking into account the "diminishing returns", it's actually very close to

ELO ~ log[log(nodes)]

I understand that Giraffe, when learning, first evaluates a position for some time and then plays against itself for several more moves (incomplete play-outs). Then it would adjust the model so that the eval of the original position gets closer to evaluations of leaf positions. It also uses iterative deepening. If so, I suspect that the ELO scaling should be fairly similar to other engines, log(nodes) at first glance, and maybe the same log[log(nodes)] taking into account "diminishing returns". Not sure about the second, and not sure about the proportionality factors, at first it seems that Giraffe is getting comparatively stronger at longer time controls.

I hope you will continue to work at this amazing engine. Even if not the strongest engine all around, it will be very useful in areas like building opening books and such positional stuff, where top conventional engines suffer.

brtzsnr · Post by **brtzsnr** » Sun Aug 30, 2015 9:08 am

Zurichess played against Girrafe a few times and I was impressed by the level of play of your engine. Over the last few months I tried to add more evaluation features but keeping the engine simple and getting more ELO was not a simple task.

Finding which evaluation features are important is very tedious and time consuming. For example in the last month I had about 30 failed to attempts to tune the material. I don't think carefully selecting and tuning the evaluation is the way to go and Giraffe is showing that there is another way.

I have some questions:
1) Do you do online learning? If so, how do you handle stale transposition table entries?
2) Do you manual extract features? E.g. passed pawn, connected rook, check. Or do you use only the position as input to your NN?

melajara · Post by **melajara** » Sun Aug 30, 2015 10:00 am

Laskos wrote:Giraffe shows the correct eval:
f7: +20.00
a2+c2: -22.00

Then, I tested several opening positions compared to large databases of human games, and Giraffe seems to show an eval more faithful to the database statistics of outcomes compared to top engines like Komodo or Stockfish.

Those are very interesting findings.

Matthew, thank you so much for having successfully applied deep learning to computer chess. Learning in chess was stalling, AFAIK, it didn't go beyond storing in giant files (permanent hash) the eval of search trees as a shortcut for future search but there was not much potential for successful abstraction and compression in such an approach of learning.

With deep learning, the opposite is true. As Kai pointed out, even at its current stage, Giraffe integrated deep abstractions, consistent with the cumulated experience of the best human players and today's top programs, this is already a fantastic achievement.

The future of deep learning is to somewhat mix and match its fantastic pattern matching potential with the iterative (deepening) nature of logical reasoning. Current deep learning NN are shallow reasoners, I sincerely hope Giraffe in the abstract model world of chess will progressively delegate search to some form of a new (reflective?) and deliberative NN. If I understand it correctly, your recent use of a second NN as a helper to the first one is a step in this direction.

You are doing an amazing job, I'm eager to read your thesis once available

Also thank you for making your project open source once the thesis is defended. It will make possible a flurry of chess teaching tools (e.g. intelligent coach for aspiring players) and will be a landmark for the collective advancement of the field if only when used to optimize eval terms in current top programs or for the general public for helping to validate or refute opening theory in supervised opening book construction.

But the definitive breakthrough, IMHO, will come when you or a successor contributor to the project will map search to a form of NN (reflective) reasoning or deliberation as I mentioned earlier.

With your work, chess is regaining its status of Drosophila of artificial intelligence, not a mean feat

Henk · Post by **Henk** » Sun Aug 30, 2015 10:13 am

Slow evaluation is worthless if it can't see it almost immediately loses a piece one step behind the horizon. Only if search is already deep enough that may not be a problem.

Michel · Post by **Michel** » Sun Aug 30, 2015 1:29 pm

It might be interesting to apply this procedure http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=30107 to get an idea of the quality of Giraffe's eval, independently of search.

matthewlai · Post by **matthewlai** » Sun Aug 30, 2015 2:29 pm

Laskos wrote:
matthewlai wrote:
Laskos wrote: It shows an English symmetrical four knights with 4.e4 and a slightly negative eval (I suppose this means less than 50% performance). My database shows slightly below 50% performance on this too.
Yeah +100.00 is the max positive non-mate score, and -100.00 is the max negative non-mate score.

I find that as training progresses, it usually starts by preferring 1. d4, then eventually switches to 1. e4, then 1. c4 (not counting very beginning where it just plays randomly).

With a7 pawn missing it's about +7.00.
b7 => +12.00
c7 => +13.00
d7 => +13.00
e7 => +13.00
f7 => +20.00
g7 => +18.00
h7 => +11.50

Not sure why it thinks f7 and g7 pawns are so important. Probably because of easier king-side attacks.
More I look at Giraffe, more amazing it seems to me. The pawn handicaps shown by you here look in line with what the best engines like Komodo and Stockfish show in several minutes search. We had some tests (play-outs) determining ELO of handicaps (at some reasonably long time controls), and tests showed that a2+c2 handicap is actually larger ELO-wise than f7 handicap, 560 ELO points versus 510 ELO points. Most strong engines, Fruit, Komodo, Stockfish included, exhibit the opposite eval, f7 as being larger handicap than a2+c2.
Here is the link to our recent discussion with Larry:

http://www.talkchess.com/forum/viewtopi ... 5&start=51
Larry Kaufman wrote:Based on a one minute search on 4 cores of the handicap starting positions, f7 actually shows as more of a handicap than even a2 + c2, 1.32 vs 1.14. Probably this means something isn't quite right about the material vs. positional scoring in Komodo, although it is well-tuned. Something we need to investigate. f7 is quite a large handicap because Black's development must be very modest due to tactical problems, but still a2 + c2 seems larger (and scores as higher in your tests). I think that Komodo's eval tricked me into underestimating the two pawn handicaps.
Giraffe shows the correct eval:
f7: +20.00
a2+c2: -22.00

Then, I tested several opening positions compared to large databases of human games, and Giraffe seems to show an eval more faithful to the database statistics of outcomes compared to top engines like Komodo or Stockfish.

Thanks for playing with it and sharing the results! Always good to have someone who knows what he is doing play with it

.

I am also curious about its scaling (ELO strength in time). Conventional engines at first can be said that exhibit a logarithmic scaling behavior, the ELO increase is proportional to number of plies which is basically log(nodes). Then, taking into account the "diminishing returns", it's actually very close to

ELO ~ log[log(nodes)]

I understand that Giraffe, when learning, first evaluates a position for some time and then plays against itself for several more moves (incomplete play-outs). Then it would adjust the model so that the eval of the original position gets closer to evaluations of leaf positions. It also uses iterative deepening. If so, I suspect that the ELO scaling should be fairly similar to other engines, log(nodes) at first glance, and maybe the same log[log(nodes)] taking into account "diminishing returns". Not sure about the second, and not sure about the proportionality factors, at first it seems that Giraffe is getting comparatively stronger at longer time controls.

I am curious about that as well. I would also expect it to follow the same pattern (log[log(nodes)]). It's possible that it gets more benefit than other engines from longer time control right now just because it's much further away from the point of diminishing return, because of how slow it is.

That is exactly how the training works. The current iterative deepening implementation simply multipliers the budget by 4 every iteration. I tried a few multipliers, and 4 seems to be the best. I have also tried changing the multiplier based on how many legal moves there are, but that didn't make much of a difference.

I hope you will continue to work at this amazing engine. Even if not the strongest engine all around, it will be very useful in areas like building opening books and such positional stuff, where top conventional engines suffer.

Thanks! The plan is definitely to continue working on it. I am re-addicted to computer chess now

.

matthewlai · Post by **matthewlai** » Sun Aug 30, 2015 2:42 pm

melajara wrote:
Laskos wrote:Giraffe shows the correct eval:
f7: +20.00
a2+c2: -22.00

Then, I tested several opening positions compared to large databases of human games, and Giraffe seems to show an eval more faithful to the database statistics of outcomes compared to top engines like Komodo or Stockfish.
Those are very interesting findings.

Matthew, thank you so much for having successfully applied deep learning to computer chess. Learning in chess was stalling, AFAIK, it didn't go beyond storing in giant files (permanent hash) the eval of search trees as a shortcut for future search but there was not much potential for successful abstraction and compression in such an approach of learning.

With deep learning, the opposite is true. As Kai pointed out, even at its current stage, Giraffe integrated deep abstractions, consistent with the cumulated experience of the best human players and today's top programs, this is already a fantastic achievement.

The future of deep learning is to somewhat mix and match its fantastic pattern matching potential with the iterative (deepening) nature of logical reasoning. Current deep learning NN are shallow reasoners, I sincerely hope Giraffe in the abstract model world of chess will progressively delegate search to some form of a new (reflective?) and deliberative NN. If I understand it correctly, your recent use of a second NN as a helper to the first one is a step in this direction.

You are doing an amazing job, I'm eager to read your thesis once available

Also thank you for making your project open source once the thesis is defended. It will make possible a flurry of chess teaching tools (e.g. intelligent coach for aspiring players) and will be a landmark for the collective advancement of the field if only when used to optimize eval terms in current top programs or for the general public for helping to validate or refute opening theory in supervised opening book construction.

But the definitive breakthrough, IMHO, will come when you or a successor contributor to the project will map search to a form of NN (reflective) reasoning or deliberation as I mentioned earlier.

With your work, chess is regaining its status of Drosophila of artificial intelligence, not a mean feat

Thanks for your kind words.

Indeed, using NN to guide search is probably going to be the next breakthrough, and it's what I have been mostly working on in the past few weeks. It's much more difficult than eval because the problem isn't as well defined. But I have also been making good progress on that. The effect is just not quite as dramatic as NN-based eval. So far it seems to achieve about the same effects as LMR (without actually using move count), but it also makes Giraffe quite a bit slower, so the net effect isn't as big (though it's still positive).

Another approach I have been thinking about is similarity pruning - humans prune a lot of moves because we know that in certain situations, certain moves won't make a significant difference. It can potentially significantly reduce branching factor if an engine can learn to do that as well with an NN's help. However, this problem is even less well defined, and I don't have a clear idea about how to do this at the moment. I have been brainstorming something like using NNs to generate "position signatures" that consist of maybe 8-16 numbers. Then clustering (using k-NN for example) can be used to allow transposition-table-like lookups that are fuzzy (don't require exact position/signature match). This is very difficult, but it's probably the most exciting idea I have right now.

matthewlai · Post by **matthewlai** » Sun Aug 30, 2015 3:01 pm

brtzsnr wrote:Zurichess played against Girrafe a few times and I was impressed by the level of play of your engine. Over the last few months I tried to add more evaluation features but keeping the engine simple and getting more ELO was not a simple task.

Finding which evaluation features are important is very tedious and time consuming. For example in the last month I had about 30 failed to attempts to tune the material. I don't think carefully selecting and tuning the evaluation is the way to go and Giraffe is showing that there is another way.

I have some questions:
1) Do you do online learning? If so, how do you handle stale transposition table entries?
2) Do you manual extract features? E.g. passed pawn, connected rook, check. Or do you use only the position as input to your NN?

Thanks!

1)
There is no online learning. The problem with using a model with a very high degree of freedom is that it requires a lot of training examples. That's the price we pay for the flexibility. Unfortunately, that also means the few thousand games it plays on chess servers or against humans probably won't have any significant effect in training anyways.

So all my training is done on a cluster (though I am only using a single node for this). Currently it requires looking at about 100 million positions before it will converge (more or less). Using a 20-cores node, it takes about 72 hours to converge.

KnightCap people used online play, and it worked for them because they were using a hand-written eval with much fewer degrees of freedom. Also, they saw it as a necessity due to the repetition problem in self-play. I solved that problem, so I can do self-play all day, and it will virtually never repeat positions.

2)
No, all those features are automatically extracted by the neural net. There are some very low level features that are given (eg. how far each sliding piece can move in each direction), but almost everything is learned. Basically I only hard-code features that are mostly direct results of the rules of the game. This way it won't have to waste time and neurons learning rules of the game (as opposed to how to play it).

matthewlai · Post by **matthewlai** » Sun Aug 30, 2015 3:04 pm

Michel wrote:It might be interesting to apply this procedure http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=30107 to get an idea of the quality of Giraffe's eval, independently of search.

I did look into this actually. It would be a bit more difficult than Crafty and Stockfish, because of the probabilistic eval. All the score margins in search (futility, razoring, etc) would have to be changed.

Michel · Post by **Michel** » Sun Aug 30, 2015 3:26 pm

matthewlai wrote:
Michel wrote:It might be interesting to apply this procedure http://rybkaforum.net/cgi-bin/rybkaforu ... ?tid=30107 to get an idea of the quality of Giraffe's eval, independently of search.
I did look into this actually. It would be a bit more difficult than Crafty and Stockfish, because of the probabilistic eval. All the score margins in search (futility, razoring, etc) would have to be changed.

I am sure you considered this but what would be the issue with using the simple conversion formula

winning probability--->pawn advantage

suggested here

https://chessprogramming.wikispaces.com ... e,+and+ELO ?

New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)

Re: New Giraffe (Aug 28)