First success with neural nets

econo · Post by **econo** » Fri Sep 25, 2020 4:01 am

That is a useful benchmark, thank you. I have been googling around trying to get simple training time formulas as a function of net parameters, but so many places just say “It depends” that I have given up getting a general answer and have reached that data-gathering stage of just asking people with specific experience training chess-specific nets.

jonkr · Post by **jonkr** » Sun Sep 27, 2020 6:06 pm

I've had another successful result with the rook & pawn endgame neural net. It was touch and go with a random regression, but finally got SlowDev scoring +4 elo versus Stockfish 11 in my rook endgame suite (so that's well outside the 95% error bars.) So I can say with a good degree of confidence that if your engine is SF11 level or weaker, doing something like this is a viable path to see some amount of elo gain. My next step will be to generalize and improve the related code, and do some other endgames.

First try was -63.5 elo vs SF11, then slightly smaller net and more training -55 elo, fixed-point speedup -42.2, more training -44.6 (random regression), more training -32.4, first pass SIMD -23 elo, more fully SIMD + training -9 elo, use stm relative for additional symmetry and more training +4 elo
I never finished the incremental updates, not the biggest deal just in endgames but maybe more important than I think. The specific nets means a bit more tracking necessary, and will have to figure out best way to handle the horizontal + vertical&stm symettry. My current net is 320 inputs and 184x32x32x1.

Much less certain, but I would guess that if the Stockfish team wanted to squeeze some additionaly elo out, this is might help even in SF12. (Or any other strong program with computing resources.) Given the huge SF search advantage, and that my cpu net computation not as optimized, its possible the eval itself might already be in the ballpark or better. How much it's worth might not be that significant though for the added complication.

Andrew :
I do think adding specific nets may surprise some people with amount of elo gain for programs not already at the very strongest level. Overall I am resigned that as a hobby I will never get that close to start-of-art so I won't know what's truly a good idea. One net does seem cleaner but multiple shouldn't be big issue once I clean up my code to generalize.

econo :
For computer time most of it by far was spent on the playing test games portion which I feed back in to generate positions for training data (and of course check results to see elo progress.) I'd say maybe 5 days computer time on Ryzen 12-core if it was continuously running. Since I was developing too some of that training was non-optimal, certainly the games where I was accidentally only setting inputs for 1 side of the board are questionable.

For time to train the net in tensor flow, loading data is like a minute or two, then about 5 minutes to train the neural net, although when to stop is arbitrary, I could have stopped much sooner.
To extract and export the training data (position inputs, position value) from the pgns maybe 4 mins (done in SlowDev).
I'm using 2 million positions to train.

jonkr · Post by **jonkr** » Sun Sep 27, 2020 9:05 pm

After thinking about this more and watching some games I'm slightly less excited about it, although I think my conclusions still hold. Initial excitement was from the process working well enough to meet my goal and beating SF11 in any test at all.

I was thinking about how many nets would be needed, would need to get test positions and run training for all of them, generalize the code in many places and fix issues fitting everything together, and then that's only late endgame. Easily doable but the amount of work remaining compared to amount of elo gain is less exciting. There probably is a balance though where some nets could be more generic. It is good learning experience either way and still what I plan to do.

Also in one game against a weaker opponent Slow in a worse position traded into a lost rook endgame, still need to make sure eval fits nicely with others, so it recognizes and scales the eval for won/lost positions higher and knows when to trade into them better, but not so high I get neural net trolling. On positives there was one game where it was reporting near 0 for a drawn rook ending where the opponent was up an outside pawn so was better there. But reaching rook endgames on the board was very rare, although big elo diff means not a good test for that. With better play for all late endgame positions and balanced evals should help earlier in game too.

mvanthoor · Post by **mvanthoor** » Sun Sep 27, 2020 10:06 pm

Just for my own clarity about this: You are not using NNUE, but you have written your own neural network code, and have trained it yourself? I can write a neural network if I put my mind to it (I know how it's done), for something such as an image classifier, but in the case of chess, I have absolutely no *** clue on how to train it.

chrisw · Post by **chrisw** » Sun Sep 27, 2020 10:35 pm

jonkr wrote: ↑Sun Sep 27, 2020 6:06 pm I've had another successful result with the rook & pawn endgame neural net. It was touch and go with a random regression, but finally got SlowDev scoring +4 elo versus Stockfish 11 in my rook endgame suite (so that's well outside the 95% error bars.) So I can say with a good degree of confidence that if your engine is SF11 level or weaker, doing something like this is a viable path to see some amount of elo gain. My next step will be to generalize and improve the related code, and do some other endgames.

First try was -63.5 elo vs SF11, then slightly smaller net and more training -55 elo, fixed-point speedup -42.2, more training -44.6 (random regression), more training -32.4, first pass SIMD -23 elo, more fully SIMD + training -9 elo, use stm relative for additional symmetry and more training +4 elo
I never finished the incremental updates, not the biggest deal just in endgames but maybe more important than I think. The specific nets means a bit more tracking necessary, and will have to figure out best way to handle the horizontal + vertical&stm symettry. My current net is 320 inputs and 184x32x32x1.

Much less certain, but I would guess that if the Stockfish team wanted to squeeze some additionaly elo out, this is might help even in SF12. (Or any other strong program with computing resources.) Given the huge SF search advantage, and that my cpu net computation not as optimized, its possible the eval itself might already be in the ballpark or better. How much it's worth might not be that significant though for the added complication.

Andrew :
I do think adding specific nets may surprise some people with amount of elo gain for programs not already at the very strongest level. Overall I am resigned that as a hobby I will never get that close to start-of-art so I won't know what's truly a good idea. One net does seem cleaner but multiple shouldn't be big issue once I clean up my code to generalize.

econo :
For computer time most of it by far was spent on the playing test games portion which I feed back in to generate positions for training data (and of course check results to see elo progress.) I'd say maybe 5 days computer time on Ryzen 12-core if it was continuously running. Since I was developing too some of that training was non-optimal, certainly the games where I was accidentally only setting inputs for 1 side of the board are questionable.

For time to train the net in tensor flow, loading data is like a minute or two, then about 5 minutes to train the neural net, although when to stop is arbitrary, I could have stopped much sooner.
To extract and export the training data (position inputs, position value) from the pgns maybe 4 mins (done in SlowDev).
I'm using 2 million positions to train.

I may have missed it, but how are your tests vs SF being done?

What time controls for SF? Are you using both engines play from each EPD twice, swap colours?
What is your NN output being used with? An AB search or just the bare eval at ply one? If search, what time controls?

jonkr · Post by **jonkr** » Sun Sep 27, 2020 11:06 pm

chrisw:
The time controls are 8 seconds + 0.08s increment, both engines have 5-man syzygy. I made an epd of about 10000 endgame positions that were filtered to only include ones with K+R+Ps on each side. These are then repeated from each side (using CuteChess, which also appends the games to a .pgn, from which I read results and take random positions per game for training up to a max per game. Max started at 6, then up to 12 as I decided I needed more, and now down to 9 as I got more games played. ) Faster time controls = more data and more decisive results, but I'm not sure what time control is best.

The neural net output is used with my regular search. The computation is pretty fast now, especially with the smaller net size and fixed point SIMD and that it's only an endgame (of one specific type.) I haven't added incremental updates of first layer but that should make improve speed.

mvanthoor:
From my understanding of your question, the answer would be Yes, it uses nothing from NNUE code and isn't compatible, and it was trained from scratch myself. There are caveats though depending on what you're asking :

I use Tensorflow to compute the NN weights from the training data. I initially wrote my own code to train the weights and it worked partially but was slow and janky and would get stuck or blow up sometimes...
If NNUE is a general term for a concept, then maybe, or maybe it will fit it in the future. I'm not doing incremental updates of first layer yet, but I plan to. I don't think it makes sense to use NNUE as general as using any neural net on a CPU, but the success of Stockfish 12 is what inspired me to try neural nets on CPU.
It's not a Leela Chess Zero idea, I initially trained against Slow 2.3 til it beat, then I trained against Stockfish 11 til it beat that. (in the R+Ps positions)

OliverBr · Post by **OliverBr** » Tue Sep 29, 2020 2:49 am

AndrewGrant wrote: ↑Fri Sep 25, 2020 12:00 am One could speed it up by using a GPU (code would work just the same, I just don't happen to have a CUDA GPU)

I have a CUDA GPU, a GTX1080Ti. Unfortunately CUDA in MacOSX is only supported until High Sierra, because nVidia and Apple don't cooperate anymore.

It's amazing with hardware. Leela completely owns any every other engine with this hardware, including ST11.
If Ethereal has a good CUDA support, it might handle SF11 on such hardware.

AndrewGrant · Post by **AndrewGrant** » Tue Sep 29, 2020 2:51 am

OliverBr wrote: ↑Tue Sep 29, 2020 2:49 am
AndrewGrant wrote: ↑Fri Sep 25, 2020 12:00 am One could speed it up by using a GPU (code would work just the same, I just don't happen to have a CUDA GPU)
I have a CUDA GPU, a GTX1080Ti. Unfortunately CUDA in MacOSX is only supported until High Sierra, because nVidia and Apple don't cooperate anymore.

It's amazing with hardware. Leela completely owns any every other engine with this hardware, including ST11.
If Ethereal has a good CUDA support, it might handle SF11 on such hardware.

I was reffering to the training code, not to the engine itself.

Ethereal will never run on GPUs.

jonkr · Post by **jonkr** » Sat Oct 10, 2020 6:56 pm

I figured I should post an update after experimenting some more.
I generalized my K+R+Ps endgame types to be able to take any one piece (R,Q,B,N) and use the same neural net structure. In the evaluation if it's a one piece endgame it sends it off to the proper net.

After doing this I started training up the other networks starting with Bishop (which had a bug where the data was missing a piece because I missed one spot hard-coded to rook still on export) but once this was fixed, it quickly became able to beat Slow 2.3 by double digit elo in the bishop endgame positions after only a couple days. I was hesitant to try to keep training to beat SF11 since I only had like 6000 start positions and was worried about overfitting, and also wondering if results would transfer to separate positions in variety of endgames.

So I trained the one piece networks with Queen and Knight next, which also had fewer starting positions than the rook, and I only played up to about 1/3 as many games as with rooks. I made the games just a bit quicker too, and took about 1.5 days for each of them. This was enough to beat Slow 2.3 from these positions by 15 to 20 elo. But I did start looking at the evals of random positions moving pieces around and there were a lot of stupid ones, like small moves and suddenly eval goes from 0 to +1.4 for no reason and back. It's a bit surprising that even with a large amount of eval nonsense it can still be better, but it is.

I was becoming more curious to try testing endgames in general to :
1. verify the method was actually helping
2. I figured I might be better to get more training data from more game like situations.

I also wanted to see where it stood against SF11, so I figured a fair test would be to download the endgames.epd from the Stockfish site. (notes : I didn't run the full set, and I have random on, repeat openings from both sides too of course. )

So the initial results were
20000 games Slow 2.3 vs SF11 -85.9
30000 games SlowDev vs SF11 -61.6

I was ready to be disappointed, and so this result actually pretty good, maybe even got a bit lucky or 2.3 unlucky. But in terms of matching SF11 in endgames shows there is just a super long way to go. If the TC wasn't super fast 8s + .08 I assume would be more drawish and closer.

For data generation I set it up to read all the positions, then the exporter sorts them by net-type and exports the board input/result data to the proper file. So if a game trades down to a handled type, or promotes into a Q+Ps or anything like that, the same game generates data from all stages. Feeding the data back in I got :

30000 SlowDev vs SF11 -58.1
20000 SlowDev vs SF11 -59.8
30000 SlowDev vs SF11 -56.9

So there is probably some learning going on, given an improvement of 4.7 elo in a 30000 game test since the first test. But it's also getting slower and noiser to test. And I did find in the individual net testing that sometimes a net randomly isn't as good in actual gameplay (random starting weights, random positions, etc. I never did the splitting of data into training / test data maybe that would help finding if some learnings are more overfitting.)

Also with variety of positions in start data, a lot of normal evaluation is used instead of the neural nets. So there is still a ton to do, and without much computing resources testing just means waiting a long time for training.

At some point I'll try to cover more of the endgame by neural nets, and continue to improve the process. Doing some speed optimizations on neural nets I'm guessing I could get another + 5 elo. Overall it all seems mostly successful, just I will need to be less lazy and put in more effort and computer time to do a better job making the nets and cover more of the endgame with net evals. The nets do seem better (and more general) than me trying to figure out terms. So it is cool to successfully make use of neural nets, even though the process bit tedious at times since I can't put much computing power to it, and over my initial excitement from having it work. (I expect an update someday but not soon.)

Angrim · Post by **Angrim** » Sun Oct 11, 2020 5:11 am

The basic idea of NNUE is to use an absurdly huge first layer, almost all of the inputs of which are 0, and incremental updates to make that fast. So if you are using a fairly small first layer(and you are) then you are not using the NNUE idea.

First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets

Re: First success with neural nets