Chess program with Artificial Neural Networks (ANN)?

bhlangonijr · Post by **bhlangonijr** » Fri Jan 08, 2010 1:30 am

Milos wrote:If chess was a game where time is irrelevant, probably it would be dominated by ANN implementations. However, since time is relevant in chess, having an engine that has a real-time ANN implementation in some its parts would make it ridiculously slow and in the same time useless.

This is not correct.

You don't need train your network at game playing time. If you are going to use it as a functional approximator to evaluate the chess board then you should firstly train your network and persist the adapted weights of the neurons to any sort of database. At game playing time you will only load the adapted weights - maybe in your engine's startup routines - and use it later to activate the network for your evaluation function. It's only a matter of compute some float points with matrix operations. With SSE it could be even faster.

IMO the main problem with ANNs and its application with classical chess algorithms is that NOBODY had a good idea about how to do it - so far.

Tunning evaluation terms in chess is just only one approach for using ANNs. There are many other great scopes for applying ANN besides that one.

In addition, KnightCap [http://citeseer.ist.psu.edu/old/400808.html] has used Temporal Difference (a sort of ANN) to tune evaluation terms in chess. The author (Jonathan Baxter) claimed + 400 ELO after tuning KnightCap evaluation terms with TD learning.

nthom · Post by **nthom** » Fri Jan 08, 2010 3:13 am

You may also want to check out "Bootstrapping from Game Tree Search" at http://jveness.info/publications/default.html which seems to be an improvement on KnightCap's methods.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Fri Jan 08, 2010 8:26 am

Milos wrote: It's not the question if is it doable, but does it have sense. In my opinion it doesn't. You can spend enormous time and you will end up with something that is weaker in any measurable sense that an algorithmic solution of much smaller complexity executed on considerably weaker hardware.

On this I won't argue, in the sense that coding an algorithmic equivalent to the neural network would be expected to perform faster.

However, you make it sound like the difference is significant. I repeat what I said: it isn't, because the network need not be a time critical part of the program, and doing multiply-accumulates on floating point numbers is something CPUs are exceptionally good at and can do greatly in parallel.

What you totally seem to forget is the reason for using a NN in the first place: it can be trained to approximate arbitrary complex functions needing no more than an oracle. Yes, an algorithmic equivalent is faster - if you know what it is to begin with!

If you are arguing just for the sake of academic research, then it's a different story, but I don't think the original question was in that direction.

Stoofvlees wasn't academic research:
http://www.chess.co.uk/twic/twic654.html#17

The results were used in Deep Sjeng 3.0.

beachknight · Post by **beachknight** » Fri Jan 08, 2010 1:08 pm

Hmm, IIRC, Chessterfield also had a learning network file.

Best,

Volker Annuss · Post by **Volker Annuss** » Fri Jan 08, 2010 6:47 pm

Hi Stephan,

Hermann uses neural networks for material evaluation and for time allocation.

Material evaluation

A neural network with 11 input nodes, one hidden layer with 5 nodes and one output node. It calculates an average score you can expect for the material on the board. This avarage score is transformed to a normal centipawn evaluation.
The input nodes are the number of pieces for each type (except kings) and a flag for even/opposite coloured bishops.
A small hash table is sufficient to get a hit rate very close to 100%, so there are no problems with hundreds of floating point operations per evaluation.
It took me very much work until it gave 20 or 30 ELO. So if I would write another engine from scratch, I would not do that again.

Time allocation

Another neural network in Hermann is for time allocation. It has 22 input nodes one hidden layer with 7 nodes and one output node. It calculates the probability for a change of the best move when iterating one ply deeper.
Input nodes are
- Scores from the last 2 iterations
- Number of possible moves
- Changes of the best move in the last 2 iterations
- Search instabilities in the last 2 iterations
- Checks and captures in the first moves of the PV
- Differences and transpositions in the first moves of the last 2 PVs

I would like to add
- Time for searching the best move (in % of the total time)
- Times for the 2 not best moves that took most of the time
but this does not work with my primitive multi processor search with a shared hash table, because times are massively influenced by hash hits from moves that were searched by another thread.

Volker

Stephan Vermeire (Brutus) · Fri Jan 08, 2010 7:57 pm

Jim Ablett wrote:Hi Stephan,

I remember reading that the old winboard engine 'Sinapse' used this >

Hi Jim,

Thank you for this useful hint! I had never heard from Synaps. As it seems, it doesn't have a neural network yet but I am sure that will be done in the future.

Stephan

Stephan Vermeire (Brutus) · Fri Jan 08, 2010 8:23 pm

Volker Annuss wrote:Hi Stephan,

Hermann uses neural networks for material evaluation and for time allocation.

Hermann a NN? That is quite promising! You really choose for an interesting approach. Tiny networks leading to a single output-node. This might be a very good approach to recognize specific features on a chessboard.

Initially I was thinking of the following approach:

- First layer consists of 64x6 neurons (64 squares x 6 types of pieces). A compressed version might be 64x3 neurons (where the piece can be coded in 3 bits).
- Intermediate layer. I have no idea of a good size for this. Testing in practice will tell how small it is still working.
- third layer: gives the evaluation score coded in a centipawn-score (using binominal coding).

The weak point in this approach is the output: I have serious doubts that the network is capable of translating the position in a scalable score. I believe that the strong feature of ANN is the capability to recognise patterns (so an output: true/false). In Herman you end up in a single node. I am sure that this is a much better way to use ANN's than binominal encoding.

In my project I don't want to restrict the NN too much to specific chess-related aspects in advance: I am looking for a way to process the full board with all its features without defining what to look for in advance. The question is: How do you do that? (I don't expect a full answer here, suggestions are welcome of course

).

Stephan

Volker Annuss · Post by **Volker Annuss** » Fri Jan 08, 2010 10:46 pm

First one question: What do you mean with binominal encoding? A quick google search showed only a few pages with "binominal encoding“ and "binomial encoding“ and a quick look at them did not help me. Or did you mean binary encoding?

You are right, that it is not easy to get a scalable score. For my timing network I have ~400000 training patterns. The result is always 1 or 0 - the best move changes or it does not change. Many patterns are very similar, but the result is different. I had to remove some input nodes for moving piece type (6 nodes) and captured piece type (5 nodes) to get better results.

I am no expert for neural networks. My theoretical knowledge is not much more than backpropagation and the RPROP algorithm (search for Martin Riedmiller, Heinrich Braun: A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm) So there may be better methods for finding features than the following ideas.

Build a neural network with two (or even more) hidden layers. Have it learn some simple properties like won, lost, drawn or undecided position. Then take a look at the hidden nodes. Maybe they represent some interesting features.

Or much more crazy: Build a neural network with some output nodes, each representing some feature to be found. You don't know what feature each output node represents. Start with a random network. Then have it calculate a big set of chess positions. Look at the outputs and do the training in a direction that the output nodes get closer to 0 or 1 and that they get more uncorrelated from each other. Go back to having it calculate the big set of chess positions again and so on until the network becomes stable and the output nodes represent some features. Try to find out what they are good for.

Stephan Vermeire (Brutus) wrote: Hermann a NN? That is quite promising! You really choose for an interesting approach. Tiny networks leading to a single output-node. This might be a very good approach to recognize specific features on a chessboard.

Initially I was thinking of the following approach:

- First layer consists of 64x6 neurons (64 squares x 6 types of pieces). A compressed version might be 64x3 neurons (where the piece can be coded in 3 bits).
- Intermediate layer. I have no idea of a good size for this. Testing in practice will tell how small it is still working.
- third layer: gives the evaluation score coded in a centipawn-score (using binominal coding).

The weak point in this approach is the output: I have serious doubts that the network is capable of translating the position in a scalable score. I believe that the strong feature of ANN is the capability to recognise patterns (so an output: true/false). In Herman you end up in a single node. I am sure that this is a much better way to use ANN's than binominal encoding.

In my project I don't want to restrict the NN too much to specific chess-related aspects in advance: I am looking for a way to process the full board with all its features without defining what to look for in advance. The question is: How do you do that? (I don't expect a full answer here, suggestions are welcome of course :wink: ).

Stephan

Stephan Vermeire (Brutus) · Fri Jan 08, 2010 11:30 pm

Volker Annuss wrote:First one question: What do you mean with binominal encoding? A quick google search showed only a few pages with "binominal encoding“ and "binomial encoding“ and a quick look at them did not help me. Or did you mean binary encoding?

Yes I mean binary encoding. Sorry for the confusion, I was teaching binominal statistics to one of my students today

Volker Annuss wrote: Or much more crazy: Build a neural network with some output nodes, each representing some feature to be found. You don't know what feature each output node represents. Start with a random network. Then have it calculate a big set of chess positions. Look at the outputs and do the training in a direction that the output nodes get closer to 0 or 1 and that they get more uncorrelated from each other. Go back to having it calculate the big set of chess positions again and so on until the network becomes stable and the output nodes represent some features. Try to find out what they are good for.

Yes, this is exactly the kind of wild experiments that I am aiming at! I believe that Neural Networks have great potential and that they have really been underestimated sofar. The reason for this: It is just hard to understand what is going on and most programmers don't really like that.

The nice thing about your approach is that you need no chess-knowledge at all to generate chess-related features. It would be great to process these features into another layer and obtain some kind of positionscore. Perhaps it is not possible to get an exact number. On the other hand: it might be very well possible to get a score like this:

- Neuron 5 signals: won position
- Neuron 4 signals: very big advantage
- Neuron 3 signals: big advantage
- Neuron 2 signals: intermediate advantage
- Neuron 1 signals: small advantage

e.g. form a binary point of view: look at the most significant bit only. Although the score will be less accurate, I believe a neural network should be able to generate such an outcome. The position is either won or not. The great advantage from getting some general positional score: You don't even need to know what the uncorrelated markers stand for! You just use them to generate the score. It is also much easier to tune the network now, since you can compare the output of the ANN with the 'truescore' of the given position.

A more conservative approach would be to choose markers you want to identify with the ANN. For example: Is the king really unsafe? Yes/No. It will however be harder to find a good reference to compare the outcome of the ANN with.

Stephan

Gian-Carlo Pascutto · Sat Jan 09, 2010 11:46 am

Stephan Vermeire (Brutus) wrote: - First layer consists of 64x6 neurons (64 squares x 6 types of pieces). A compressed version might be 64x3 neurons (where the piece can be coded in 3 bits).

I think this is very ambitious and likely to fail. (That's another way of saying: I tried this)

The problem is that you are requiring the network to find for itself many layers of abstraction. It's also a bit of a waste as we already known some abstractions to be valuable (e.g. passed pawns).

I think a successful (but necessarily less ambitious) attempt involves adding to the inputs the abstractions you already know about.

Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?

Re: Chess program with Artificial Neural Networks (ANN)?