*First release* Giraffe, a new engine based on deep learning

kinderchocolate · Post by **kinderchocolate** » Thu Jul 09, 2015 3:21 am

I agree. There has been new exciting deep-learning R packages coming up recently. Everybody in conferences is talking about it. It's got me very excited. Now, you've applied it to a chess engine, absolutely incredible.I'm not that interested in playing with the engine, so I didn't download it. I'm much more interested in your algorithm and source code.

Once your paper / source code is published, please post it here. I'll surely study it.

It's not very important to tune your engine to make it stronger because you'll never be able to beat alpha-beta. It's your idea and how it can be applied to automatic self-learning important. I'm particularly interested how you calibrate the parameters, like how many data points you have for your sample.

matthewlai · Post by **matthewlai** » Thu Jul 09, 2015 1:33 pm

kinderchocolate wrote:I agree. There has been new exciting deep-learning R packages coming up recently. Everybody in conferences is talking about it. It's got me very excited. Now, you've applied it to a chess engine, absolutely incredible.I'm not that interested in playing with the engine, so I didn't download it. I'm much more interested in your algorithm and source code.

Once your paper / source code is published, please post it here. I'll surely study it.

It's not very important to tune your engine to make it stronger because you'll never be able to beat alpha-beta. It's your idea and how it can be applied to automatic self-learning important. I'm particularly interested how you calibrate the parameters, like how many data points you have for your sample.

Ah I am still using alpha-beta. So far only the evaluation function has been replaced by DNN.

In the near future I want to also use DNNs to guide alpha-beta for more selective search, but there would still be alpha-beta.

op12no2 · Post by **op12no2** » Thu Jul 09, 2015 2:36 pm

Hi Matthew. I removed Giraffe after 40/50 games as it had not won one - the PGN is here (renamed .txt) http://op12no2.me/stuff/20150709.txt which may or may not be useful. I didn't see much of the games but in the last one I noticed it was reporting M160 while the other engine was reporting M7 so possibly something up with mate scores? Good luck with your project and thesis; can't wait to read it...

NB: it was playing this Javascript engine at 5'5" http://op12no2.me/toys/lozza/lozza.js

Ah the mate values are in the pgn:-

Code: Select all

&#123;&#40;a4b5 a6b5 f1e2 c5c4 b3c4
b5c4 e2f1 c4c3 f1e2 c3c2 e2d2 f2f1q d2c2&#41; -M288/12 3&#125; axb5 &#123;&#40;a6xb5 Kf1-g2
c5-c4 b3xc4 b5xc4 Kg2-f1 c4-c3 Kf1-e2 c3-c2 Ke2-d2 f2-f1Q Kd2xc2 g3-g2
Kc2-c3 g2-g1Q h5-h6 Qg1-e3+ Kc3-c2 Qf1-e2+ Kc2-b1 Qe3xh6 Kb1-a1 Qe2-d3&#41;
+21.37/17 6&#125; 41. Ke2 &#123;&#40;f1e2 c5c4 b3b4 c4c3 h5h6 d8c7 e2f3 f2f1q f3g3 c3c2
g3g4 c7b6 g4g5 c2c1q g5h5&#41; -M232/13 5&#125; c4 &#123;&#40;c5-c4 b3xc4 b5xc4 Ke2-f1 h7-h6
Kf1-e2 c4-c3 Ke2-d3 f2-f1Q+ Kd3xc3 g3-g2 Kc3-d4 g2-g1Q+ Kd4-e4 Qf1-c4+
Ke4-e5 Qg1-c5+ Ke5-f6 Qc5-g5+) +M9/15 3&#125; 42. bxc4 &#123;&#40;b3c4 b5c4 e2f1 c4c3
f1e2 h7h6 e2d3 g3g2 d3c3 g2g1q c3b4 g1g4 b4a3 f2f1q&#41; -M215/12 3&#125; bxc4
&#123;&#40;b5xc4&#41; +M8/1 0&#125; 43. Kf1 &#123;&#40;e2f1 c4c3 f1e2 h7h6 e2d3 f2f1q d3c3 g3g2 c3b4
g2g1q b4a4 g1g4 a4a5 g4h5 a5b6&#41; -M160/12 3&#125; h6 &#123;&#40;h7-h6 Kf1-e2 c4-c3 Ke2-d3
f2-f1Q+ Kd3xc3 g3-g2 Kc3-d4 g2-g1Q+ Kd4-e4 Qf1-c4+ Ke4-e5 Qg1-c5+ Ke5-f6
Qc5-g5+) +M7/12 0  Arena Adjudication&#125; 0-1

matthewlai · Post by **matthewlai** » Thu Jul 09, 2015 2:45 pm

op12no2 wrote:Hi Matthew. I removed Giraffe after 40/50 games as it had not won one - the PGN is here (renamed .txt) http://op12no2.me/stuff/20150709.txt which may or may not be useful. I didn't see much of the games but in the last one I noticed it was reporting M160 while the other engine was reporting M7 so possibly something up with mate scores? Good luck with your project and thesis; can't wait to read it...

Thanks! Yeah I wasn't expecting it to win games. It's incredibly shallow right now

.

The mate scores are just mis-interpretations by the GUI. It seems like many GUIs interpret very high scores as mate scores, since most engines don't give evaluation scores that high. It shouldn't affect gameplay (unless the GUI is adjudicating matches based on engine scores).

Maybe I'll scale the score output in the next version so that won't happen. In any case, though, the scores won't be comparable to other engine's scores, since it's probabilistic. For example, if I scale the advantage of one pawn in the beginning to 100, the advantage of one queen would only be 400 or so, and the advantage of two queens probably 500 or so. This is because chance of winning is not linear to material. For example, being 2 queens up and 3 queens up give you roughly the same chance of winning (close to 100%).

Henk · Post by **Henk** » Thu Jul 09, 2015 3:43 pm

Problem with neural network is that it is a black box. So you don't understand how it gets to its results. Also their are a very large number of weights to be tuned so you only can find a local optimum. It may also be that sloppy learning might give the best results for it generalizes better. So if you have an implementation what can you do to improve it except for just restarting the tuner and hope the next solution found will be better.

matthewlai · Post by **matthewlai** » Thu Jul 09, 2015 4:31 pm

Henk wrote:Problem with neural network is that it is a black box. So you don't understand how it gets to its results. Also their are a very large number of weights to be tuned so you only can find a local optimum. It may also be that sloppy learning might give the best results for it generalizes better. So if you have an implementation what can you do to improve it except for just restarting the tuner and hope the next solution found will be better.

Yes, neural networks are black boxes.

The problem with local minimums is not only for neural networks. We are way past the point where we can find global minimums, even with normal evaluation functions.

A typical evaluation function nowadays has a few hundred parameters, and are tuned for local minimums as well (either manually or automatically).

The problem with tuning a normal evaluation function is that, for example, for piece-square tables, each square needs to be tuned individually, and if, for example, the tuner finds that knights at e5 should be given a bonus for centrality, it won't generalize that to e4 (which needs to be separately learned).

In a neural network model with a good feature representation, a piece-square table-type knowledge can be represented as a step-wise linear function in 2D, which can generalize better, and with fewer parameters.

Empirically, neural nets have been found to be very good at finding good local minimums, compared to most other models. I usually get very similar results if I run the tuner multiple times, with different random initialization.

"Sloppy learning" has a name. It's called learning with regularization

. I don't need to do that because I only present each training example once to the net, so there's little chance of overfitting. One of the properties of deep nets is that they try to automatically find low level features that are universally useful. That's why they usually generalize much better than shallower and wider nets.

Oh there are many things I can do to improve it besides restarting the tuner. Most of my time is spent tweaking the feature representation, which has huge effects on how the net generalizes. Different network architectures and connectivity schemes also have large effects. Lots of things to work on

.

Henk · Post by **Henk** » Thu Jul 09, 2015 4:56 pm

Still may be that evaluation using a neural network is too slow compared to traditional implementations.

Might also be that tuning a network and finding the right architecture takes a huge amount of (computing) time.

I remember I included sizes of the layers plus weights as the parameters to be tuned for my network tuner. Because I used genetic algorithm for learning I also had a version where I tuned the parameters of the genetic algorithm like population count, crossover and mutation probability automatically.

But in the end I used simulated annealing for I wanted to have global maxima.

matthewlai · Post by **matthewlai** » Thu Jul 09, 2015 5:05 pm

Henk wrote:Still may be that evaluation using a neural network is too slow compared to traditional implementations.

Might also be that tuning a network and finding the right architecture takes a huge amount of (computing) time.

I remember I included sizes of the layers plus weights as the parameters to be tuned for my network tuner. Because I used genetic algorithm for learning the weights I remember I also had a version where I tuned the parameters of the genetic algorithm like population count, crossover and mutation probability.

But in the end I used simulated annealing for I wanted to have global maxima.

Yes, it's slower. The advantage is that it may learn things beyond humans' imaginations, and for me that's a cause worth pursuing.

Finding the right architecture certainly takes a lot of computing time. Luckily I have access to a supercomputer, so that's less of a problem.

Quite a few people have worked on using genetic algorithms to optimize for network architecture. However, that's even more computationally-intensive, by orders of magnitude. For most problems it's not practical, even when you have a supercomputer.

Genetic algorithms are usually used as a last resort, for models where we can't find the gradient. For models where we can (like neural networks), gradient-based methods are often much much faster, and usually gives you as good if not better minimums. That's why almost no one uses GA to train neural nets anymore.

Simulated annealing does not give you global minimum. It just makes it harder for the model to get stuck in small local minimums. It will still get stuck in larger local minimums. Most of the best gradient-based methods (like stochastic gradient descent) nowadays do include some elements of simulated annealing.

It is just not practical to look for global minimums when you have more than a handful of parameters.

There is no way you can find a global minimum without examining every part of the parameter space, because it's always possible that the minimum is very steep, and is in a part of the parameter space you didn't examine.

Henk · Post by **Henk** » Thu Jul 09, 2015 5:30 pm

There were constraints that created boundaries in the search space so a gradient descent search easily directed to a boundary of the space giving solutions that were not interesting. Using a penalty when a parameter gets closer to the boundary also gave problems but I don't remember anymore.

So that's why I used GA and SA.

Henk · Post by **Henk** » Thu Jul 09, 2015 6:12 pm

Henk wrote:There were constraints that created boundaries in the search space so a gradient descent search easily directed to a boundary of the space giving solutions that were not interesting. Using a penalty when a parameter gets closer to the boundary also gave problems but I don't remember anymore.

So that's why I used GA and SA.

Mistake. Main reason was that some parameters only had discrete values so you could not compute a normal gradient.

First release Giraffe, a new engine based on deep learning

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear

Re: First release Giraffe, a new engine based on deep lear