Attempting to implement NNUE into my engine

sovaz1997 · Post by **sovaz1997** » Mon Jan 27, 2025 7:56 pm

Hello guys! How are you?
I haven't been here for a while. I hope you're all doing well!

Recently I've been wanting to add NNUE estimation to my chess engine.
I decided to start with a simple 768xNx1 network.

I made a self-written trainer on PyTorch, which works. Btw it is very important that the dataset contains only quiet positions. Otherwise, the loss will be very big and the net will play very poorly. Of course I checked, how net works, compare evals on test positions. All works well, with quantization I have same eval values. Also I added splitting dataset to validation and train to be sure, that net approximating correctly.

------

But in the end, it is not possible to catch up with the usual HCE. The gap still is 200-300 Elo points, with any nets, I tried to train 768x[8|16|...|256]x1. They showed approximately the same results in terms of playing strength. And yes, I implemented SIMD instructions (my small networks even work faster than HCE). (because I work on Mac M2, I added NEON instructions only for now).

-------

What I think:
1) because my HCE was tuned by Texel method, it is already strong enough. So I can't beat it with small networks;
2) 40 millions train dataset is small maybe? (Different networks show the same results, because a larger NN requires more positions). I started with ccrl positions, eval it by hce on small nodes (5000-20000), after I made self-play games with 5000 nodes +-50% to have no determination with repeated positions. And don't see big difference between quality of play, my own dataset was even better.

--------

So, guys, would be nice to hear your experience. Maybe I missed something, I don't know. At the moment, I'm generating more positions to test my theory that there aren't enough positions to train bigger nets (768x32 and bigger). Because even 768x8 show not bad results.

flok · Post by **flok** » Tue Jan 28, 2025 6:55 am

sovaz1997 wrote: ↑Mon Jan 27, 2025 7:56 pm 2) 40 millions train dataset is small maybe? (Different networks show the same results, because a larger NN requires more positions). I started with ccrl positions, eval it by hce on small nodes (5000-20000), after I made self-play games with 5000 nodes +-50% to have no determination with repeated positions. And don't see big difference between quality of play, my own dataset was even better.

40M sounds not like a lot. I used 1 billion. For that I wrote a distributed generator which some of my friends at https://nurdspace.nl/ ran on their pc (one rented a 256 core pc for the occasion). By that I got the data in less than a weekend.

mar · Post by **mar** » Tue Jan 28, 2025 7:25 am

1) 768-256-1 should beat your HCE easily, perhaps even smaller
in order to eliminate performance out of the equation (to see you're on the right track), I'd do a fixed node match to see if the eval itself disregarding performance is better

2) I trained my first successful net with 100m positions, but struggled with fewer

sovaz1997 · Post by **sovaz1997** » Tue Jan 28, 2025 8:29 am

Thanks guys!
By the way, my 768x8 network trained on 150M beat 768x128, trained on 40.

So, after all, count of games matters.
I have another idea: generate games absolutely randomly, just random from all moves. Maybe I will get a greater variety of positions and this will allow me to train better on a smaller number of games.

sovaz1997 · Post by **sovaz1997** » Tue Jan 28, 2025 8:44 am

in order to eliminate performance out of the equation (to see you're on the right track), I'd do a fixed node match to see if the eval itself disregarding performance is better

Hmm, good idea!
This allows for really fast comparison of the network and the evaluation function, if we don't take performance into account
I think it's also a very good idea to run NN vs NN with a fixed number of nodes, if the NNs are the same size. Thanks!

And yeah, my HCE for now better, than 768x8x1. But I'm sure that it's not limit of 768x8x1. Will try to train on more games, and also next step: will create dataset with random games

Currenctly now I have this on fixed nodes:

Code: Select all

Zevra NNUE 768x8x1 | 140M games with 5000+-50% nodes | epoch 6

Score of Zevra NNUE 768x8x1 vs Zevra Classic: 56 - 221 - 43  [0.242] 320
...      Zevra NNUE 768x8x1 playing White: 32 - 107 - 21  [0.266] 160
...      Zevra NNUE 768x8x1 playing Black: 24 - 114 - 22  [0.219] 160
...      White vs Black: 146 - 131 - 43  [0.523] 320
Elo difference: -198.2 +/- 40.4, LOS: 0.0 %, DrawRatio: 13.4 %
SPRT: llr -2.95 (-100.3%), lbound -2.94, ubound 2.94 - H0 was accepted

that about same with short tc, because performances +-equal

768x32x1 gives +- same result. So this mean I think, that games really not enough, because I got same strength after training on different NN size.

--------
So, my current try: go from 140M to 200M. If net 768x8x1 will be better with 200M, it will means for me that I still not reach limit on small net even with so big quantity of games.
I also blamed my implementation of the trainer (maybe it doesn't explore all positions, or something else). But everything seems to be fine with this. Therefore, the next idea will be tested on random games, but in smaller count. Maybe this will give its results (a good result if the network can win with the same or less amount of training data).

After I will play with evaluation of dataset, now I'm evaluating on 5000 nodes with my engine. But I'm not sure that increasing it will be useful, because even ccrl games give worst results agains 5000 nodes.

JacquesRW · Post by **JacquesRW** » Tue Jan 28, 2025 3:01 pm

Such performance suggests a bug IMO.
As always, you will get better (and more interactive) help if you join relevant discord servers rather than ask here e.g. https://discord.com/invite/F6W6mMsTGN (engine programming)

sovaz1997 · Post by **sovaz1997** » Tue Jan 28, 2025 3:53 pm

JacquesRW wrote: ↑Tue Jan 28, 2025 3:01 pm Such performance suggests a bug IMO.
As always, you will get better (and more interactive) help if you join relevant discord servers rather than ask here e.g. https://discord.com/invite/F6W6mMsTGN (engine programming)

Thank you! I'll try asking there too

I also think that it's non-obvious bug, but anyway need to have more stats

sovaz1997 · Post by **sovaz1997** » Tue Jan 28, 2025 6:18 pm

I run last test 200M vs 140M (768x8x1) and with 1 node only per game, and got this results, which of course don't impress me

Code: Select all

Score of Zevra NNUE (Full & Latest) vs Zevra NNUE (Full): 1816 - 1638 - 1525  [0.518] 4979
...      Zevra NNUE (Full & Latest) playing White: 1002 - 818 - 670  [0.537] 2490
...      Zevra NNUE (Full & Latest) playing Black: 814 - 820 - 855  [0.499] 2489
...      White vs Black: 1822 - 1632 - 1525  [0.519] 4979
Elo difference: 12.4 +/- 8.0, LOS: 99.9 %, DrawRatio: 30.6 %
SPRT: llr 2.95 (100.2%), lbound -2.94, ubound 2.94 - H1 was accepted

So, for now I think that really something else maybe. Plan to try all random games, but I'm not sure, maybe I really have mistakes, if the practice of developing other engines was more successful with similar data sets and net sizes

sovaz1997 · Post by **sovaz1997** » Tue Jan 28, 2025 6:41 pm

As always, you will get better (and more interactive) help if you join relevant discord servers rather than ask here e.g. https://discord.com/invite/F6W6mMsTGN (engine programming)

Yes, already did it. Yeah, I haven't been here for a while, now I see a lot of discussing about engines programming in Discord, thanks a lot!

rdhoffmann · Post by **rdhoffmann** » Tue Jan 28, 2025 9:48 pm

The smallest network I trained was (768+64) x 28 x 1 and it is already much better than my HCE attempts.

Note the extra 64 inputs, I use it for the side to move (+/- 1) and trying out various ideas.

One question though, why would such small networks need so many positions? I don't see how (or why) that will improve performance. There is only so much a small network can learn?

Attempting to implement NNUE into my engine

Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine

Re: Attempting to implement NNUE into my engine