First success with neural nets

jonkr · Post by **jonkr** » Mon Nov 02, 2020 11:48 pm

So another update about the results after some more training/experimenting since I'm taking a break from it.

Slow 2.4 vs Stockfish 11 scored -35 elo (general endgame suite.)

I didn't fully train all the nets, I expect the path to -10 elo or better against SF11 with more training and maybe minor adjustments is straightforward, based on the most trained subsets.

I have 8 endgame nets. These are the nets I have active :

Code: Select all

	EndgameNets.push_back(new OnePieceEndgameNet(ROOK));
	EndgameNets.push_back(new OnePieceEndgameNet(BISHOP));
	EndgameNets.push_back(new OnePieceEndgameNet(QUEEN));
	EndgameNets.push_back(new OnePieceEndgameNet(KNIGHT));
	EndgameNets.push_back(new OneDiffPieceEndgameNet(BISHOP,KNIGHT));
	EndgameNets.push_back(new OneDiffPieceEndgameNet(ROOK,BISHOP));
	EndgameNets.push_back(new OneDiffPieceEndgameNet(ROOK,KNIGHT));
	EndgameNets.push_back(new GeneralEndgameNet());

The index of which net to use is stored in the material hash. The general endgame is <= 2 pieces no queens. The others I think are obvious from the names. Exporting training data tests to see which net is active and exports a data file for each of them from one big position list.

With more opening variety in the last test against SF11 I ran, the Rook net was even after 8000 games so 10000 opening positions did still lead to a minor amount of overfit. (I made pgn->epd that exports first position in game that net is active for, so I think I now have 50,000+ positions for each type.)
I did also improve the speed, mostly by converting the int16 weights, which also made for smaller file sizes. I went back to STM as just another input, and export training data twice, once with sides flipped. The net layers for onePiece are 192 x 32 x 32 x 1. I slightly increased first later so weights would be multiple of 16 for int16 SIMD, but not a big difference.

It would take me about 24 hours to train a one piece endgame net that would perform on par with Slow 2.3, which would be overfit with some weird values, but get enough important stuff right to be as good or better. After a couple more days it would be clearly better, but not yet even with SF11. The only nets I tried to train enough to match SF11 seemed likely successful.

In the beginning trying out neural nets was very interesting, although once I got the process down better the time needed for training/testing was kinda long and my interest waned. (It would be interesting again to experiment with if I had like 10+ times the computing power, but as is feels like most experiments don't cause major differences so just feel unclear.)

Angrim : I would agree that this is not NNUE style, although I do try to be efficient. I'm still not doing incremental updates but may do that later. I of course only update active inputs and skip the 0's. In fact since they are all 1 or 0 inputs only have to do an add of the weights for first layer. Then the actual net sizes aren't that big.

jonkr · Post by **jonkr** » Thu Nov 12, 2020 12:59 am

Despite not working on chess, I have still been testing out neural nets.
I started first with something very similar : Checkers.

15-20 years ago I had worked a bit on a checkers program. Using the same neural net code and training method I had used for chess endgames and detailed in this thread, I was able to make my checkers program 100+ elo stronger in about a week. (Note that checkers is very drawish, so to make it less drawish in my testing I used fast time controls and didn't use any large endgame or opening databases, but because of the drawishness 100 elo increase is still a lot in checkers.)

The details are described here : http://www.3dkingdoms.com/checkers.htm
There is even a github link to the source, the NeuralNets directory is pretty much as is copied from SlowChess.
The search and other parts of the program are less advanced. I spent some time cleaning up the code but my ancient code was super messy so it still needs work.

I haven't decided if I'll try to generalize more by using other board games, or go further afield to try some completely different learning.

Madeleine Birchfield · Thu Nov 12, 2020 6:46 am

jonkr wrote: ↑Thu Nov 12, 2020 12:59 am Despite not working on chess, I have still been testing out neural nets.
I started first with something very similar : Checkers.

15-20 years ago I had worked a bit on a checkers program. Using the same neural net code and training method I had used for chess endgames and detailed in this thread, I was able to make my checkers program 100+ elo stronger in about a week. (Note that checkers is very drawish, so to make it less drawish in my testing I used fast time controls and didn't use any large endgame or opening databases, but because of the drawishness 100 elo increase is still a lot in checkers.)

The details are described here : http://www.3dkingdoms.com/checkers.htm
There is even a github link to the source, the NeuralNets directory is pretty much as is copied from SlowChess.
The search and other parts of the program are less advanced. I spent some time cleaning up the code but my ancient code was super messy so it still needs work.

I haven't decided if I'll try to generalize more by using other board games, or go further afield to try some completely different learning.

Maybe first try international draughts. I'd like to see somebody break the dominance of Fabien Letouzey's Scan in international draughts, and nobody that I know of has yet implemented a draughts engine with neural networks yet.

jonkr · Post by **jonkr** » Fri Nov 13, 2020 8:29 pm

While cleaning up the GuiNN checkers code some more I managed to improve it by an extra 10 elo. So it definitely has some room to grow but maybe I am done now since checkers could be mostly covered by large databases instead, and not using them feels a bit artificial.

International Draughts is an interesting suggestion, I didn't know the rules until I looked them up, but 50 useable squares, still 4 input types (2 colors, 2 types), seems like a good game with complexity more than 8x8 checkers but less than chess, and can still use same bitmaps for interface. I am curious at how hard it would be to make a strong program.

Right now I'm still thinking of trying Othello next. Like checkers it has the opening->solved endgame issue for programs these days (although unlike checkers the actual board is divergent, so the solves are from being able to search until the game is over / board is filled.) So it would mostly be to see if I can quickly make a way stronger program than my old Othello program and improve the endgame solving speed.

Rein Halbersma · Post by **Rein Halbersma** » Sat Nov 14, 2020 3:39 pm

jonkr wrote: ↑Thu Nov 12, 2020 12:59 am The details are described here : http://www.3dkingdoms.com/checkers.htm

This is really amazing! I've managed to both compile your engine and run your Python training code, it all works flawlessly. I have a GTX 1050 Ti with 4Gb RAM as GPU and that allows me to use a batch size of 1M (instead of your 15K). All the other parameter (learning schedule) I left untouched.

Training on the two PDN files with 224K positions on your website finished in 18 seconds. I found mean squared errors of 3.48%, 1.11%, 0.23% and 0.00066% for the 4 game phases.

Ed Gilbert gave me a huge PDN archive of 145 million positions (64M early, 40M mid, 30M end, 11M late end) that he generated with his Kingsrow engine for its eval tuning (with a completely different eval). I just finished training your 30K parameter networks on these positions. It took just 24 minutes. I found mean squared errors of 4.60%, 2.55%, 0.86% and 0.62%. It suggests that overfitting is a real problem on small samples.

I'll hand off the weights to Ed and ask him to run some engine matches with it and post the results.

Keras is a very nice library BTW. I had been dabbling with Tensorflow around 2017 and gave up because of the byzantine API. But this is so much easier to get working. There are also some niceties such as model.summary() which for your network gives output like this:

Code: Select all

device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense (Dense)                (None, 192)               23424
_________________________________________________________________
dense_1 (Dense)              (None, 32)                6176
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 33
=================================================================
Total params: 30,689
Trainable params: 30,689
Non-trainable params: 0

There are a few things that I'm gonna try next. First, I've managed to export the 4 Keras models to 4 Protocol Buffer archives on disk. This contains the graph topology and weight files. These can also be loaded into a C++ program and directly evaluated without having to write any duplicated network code. I still have to setup the proper Tensorflow include and library paths to be able to link against the underlying TF code. The imported TF graph will be calling a floating point function. But there are also methods to automate the float-to-integer conversion (called "quantization"), but I haven't tried any of that. For the other tweaks, so my fork of your repo.

Rein Halbersma · Post by **Rein Halbersma** » Sat Nov 14, 2020 4:01 pm

Training on my CPU looks about 2.7X slower (Xeon E5-1650v4, 6/12 C/T @ 3.6GHz) with almost 98% CPU utilization on all 12 hyperthreads. Training on the GPU didn't raise CPU utilization above 20% and also let me watch some Netflix while training without any hiccups.

Rein Halbersma · Post by **Rein Halbersma** » Sat Nov 14, 2020 4:33 pm

Madeleine Birchfield wrote: ↑Thu Nov 12, 2020 6:46 am Maybe first try international draughts. I'd like to see somebody break the dominance of Fabien Letouzey's Scan in international draughts, and nobody that I know of has yet implemented a draughts engine with neural networks yet.

Fabien might disagree with me, but you can call the Scan pattern-based eval a "proto-NN". It's basically a sparse input array directly combined with a sigmoid to the output scalar for training purposes, without any hidden layers. Removing the sigmoid for engine code makes this a pure lookup and integer based eval. There's a tutorial on the Tensorflow site called "wide learning" (as opposed to "deep learning") which is very much in the spirit of Scan based patterns.

jonkr · Post by **jonkr** » Sat Nov 14, 2020 11:43 pm

Rein Halbersma wrote: ↑Sat Nov 14, 2020 3:39 pm This is really amazing! I've managed to both compile your engine and run your Python training code, it all works flawlessly. I have a GTX 1050 Ti with 4Gb RAM as GPU and that allows me to use a batch size of 1M (instead of your 15K). All the other parameter (learning schedule) I left untouched.

Training on the two PDN files with 224K positions on your website finished in 18 seconds. I found mean squared errors of 3.48%, 1.11%, 0.23% and 0.00066% for the 4 game phases.

Thanks, I'm glad it seems to all be in order and someone is already testing out the code!

Just the two test match PDNs on my site is definitely too little data, I think the actual nets that come with it I used about 12x as much data.
Quality if data is important too, the vs Gui 1.1 match probably not highest quality, but I think having a small amount of lopsided data is helpful. My guess is that training on (fast timecontrol) games played by an earlier version of the net is also helpful to correct some positions that the net happens to misevaluate by providing counter examples in the training data. The actual amount of error has helped me catch when one of my changes messed something up in the process (way higher than expected or different in some way than the usual), but I can't guess playing strength based on it. The current net probably somewhat overfit to my test conditions (mainly 3-move openings vs Cake and sometimes older GUI versions, no book, 4pc db.)

Earlier this year Ed Gilbert emailed me suggesting I think about updating my checkers program, so when I was looking at something to apply neural nets to I finally got around to it.

Rein Halbersma · Post by **Rein Halbersma** » Tue Nov 17, 2020 5:23 pm

In your C++ header NeuralNet.h I see a commented SumIncremental() stub. Implementing this should speed up the eval considerably. A regular non-capture move requires a maxtrix column addition and subtraction in the first hidden layer, instead of a summation over 121 of those columns. That could be a 60x speed up for that part. Is this something you plan to implement?

jonkr · Post by **jonkr** » Wed Nov 18, 2020 9:50 pm

I will probably try it out sometime this week in the checkers program and see if it seems worth it. It is more state to track, and the speed up is unlikely to be that big, but I expect could find a clear speedup for early/mid game. (To compute the first first layer values, currently the Checkers net memcpys the biases, then does straight SIMD vector adds for any '1' input based on board, which is pretty quick compared to rest of net, so a max of 24 inputs at beginning then that would start to reduce quickly. )

I made an Othello program based on the GuiNN code, but haven't tried to improve strength, found that I wasn't that happy with the state of the code (ability to generalize, and still lacking support for some common features.) Othello I expect decent results but am guessing Neural Nets not as suited to Othello as other games, still plan on trying it as interesting test.
So I went back to improving the checkers code more, plan to update once I verify I didn't break anything and it's stronger.

Also did more endgame chess training since my computer was idle, got up to -31 elo vs SF11 in general endgames, but (probably just bad luck) went 1.5 days without improvement and got too lazy to keep it running again.

First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets