Attempting to implement NNUE into my engine

Discussion of chess software programming and technical issues.

Moderator: Ras

sovaz1997
Posts: 292
Joined: Sun Nov 13, 2016 10:37 am

Re: Attempting to implement NNUE into my engine

Post by sovaz1997 »

Today I found that scale was not the cause

It seems that the cause was incorrectly working heuristics (possibly hash), because if I leave only null move pruning and lmr, I have about +70-100 Elo in 3+0.02 without eval scaling. Scaling helps, but this is already a consequence. I think with correct working, no need to scale. Yeah it's weird that problem only when I use NNUE evaluation.

So, will enable heuristics one by one and test it.
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.6 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
sovaz1997
Posts: 292
Joined: Sun Nov 13, 2016 10:37 am

Re: Attempting to implement NNUE into my engine

Post by sovaz1997 »

Done!
I did it. Just make clear hash after every move, because I found some problems with hash (and this problem was happen only with NNUE)

Code: Select all

tc=3+0.02
Score of Zevra Self (Gen 8) vs Zevra Classic: 320 - 214 - 84  [0.586] 618
...      Zevra Self (Gen 8) playing White: 142 - 135 - 32  [0.511] 309
...      Zevra Self (Gen 8) playing Black: 178 - 79 - 52  [0.660] 309
...      White vs Black: 221 - 313 - 84  [0.426] 618
Elo difference: 60.2 +/- 25.8, LOS: 100.0 %, DrawRatio: 13.6 %
SPRT: llr 2.95 (100.4%), lbound -2.94, ubound 2.94 - H1 was accepted
Next step - try to find a bug and not do reser hash after every move.
Now some improvement over hce with fully zero-trained 768x64 net.
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.6 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
sovaz1997
Posts: 292
Joined: Sun Nov 13, 2016 10:37 am

Re: Attempting to implement NNUE into my engine

Post by sovaz1997 »

Hello!
I completed trained the net 768x256 on 1 billion self-play positions and have next results:

Code: Select all

[In progress...]
tc=60+0.6; Hash=64
Score of Zevra NNUE vs Zevra v2.5: 1256 - 531 - 293  [0.674] 2080
...      Zevra NNUE playing White: 598 - 325 - 117  [0.631] 1040
...      Zevra NNUE playing Black: 658 - 206 - 176  [0.717] 1040
...      White vs Black: 804 - 983 - 293  [0.457] 2080
Elo difference: 126.4 +/- 14.6, LOS: 100.0 %, DrawRatio: 14.1 %
Next plan is continue to improve the search, add AVX-support (because now I did only for NEON).
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.6 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Attempting to implement NNUE into my engine

Post by lithander »

+126 is a solid improvement! Congratz on the release of 2.6! How did you decide that this is enough for now? Have you hit a plateau or are you just sick of generating more data and want to move on to more interesting things?

I vaguely remember that when many devs switched to NNUE they often reported higher Elo gains then I got with Leorik 3.0. But at the same time my net just didn't gain anymore with more data or a larger neuron count. I think long term it can hurt an engine a lot when new features underperform but at the same time it's hard to know what the peak possible performance even is.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Engin
Posts: 1000
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Attempting to implement NNUE into my engine

Post by Engin »

sovaz1997 wrote: Mon Jan 27, 2025 7:56 pm Hello guys! How are you?
I haven't been here for a while. I hope you're all doing well!

Recently I've been wanting to add NNUE estimation to my chess engine.
I decided to start with a simple 768xNx1 network.

I made a self-written trainer on PyTorch, which works. Btw it is very important that the dataset contains only quiet positions. Otherwise, the loss will be very big and the net will play very poorly. Of course I checked, how net works, compare evals on test positions. All works well, with quantization I have same eval values. Also I added splitting dataset to validation and train to be sure, that net approximating correctly.

------

But in the end, it is not possible to catch up with the usual HCE. The gap still is 200-300 Elo points, with any nets, I tried to train 768x[8|16|...|256]x1. They showed approximately the same results in terms of playing strength. And yes, I implemented SIMD instructions (my small networks even work faster than HCE). (because I work on Mac M2, I added NEON instructions only for now).

-------

What I think:
1) because my HCE was tuned by Texel method, it is already strong enough. So I can't beat it with small networks;
2) 40 millions train dataset is small maybe? (Different networks show the same results, because a larger NN requires more positions). I started with ccrl positions, eval it by hce on small nodes (5000-20000), after I made self-play games with 5000 nodes +-50% to have no determination with repeated positions. And don't see big difference between quality of play, my own dataset was even better.

--------

So, guys, would be nice to hear your experience. Maybe I missed something, I don't know. At the moment, I'm generating more positions to test my theory that there aren't enough positions to train bigger nets (768x32 and bigger). Because even 768x8 show not bad results.
its not about the number of training positions or the number of hidden size, of course the speed should not unnassary drop down a lot for the eval, but its more important the network should learn by reinforcement learning, that means what it gets a score from depth X it should learn that, in the second generation of selfplay it should learn from depth X + 1 again to get more depth searched more accurancy to look forward.

i am training now such way like alpha zero did, selfplay and going to learn deeper and deeper search scores to evaluate position after the leaf nodes in the qsearch later.

Then in generation X it should be beat the HCE version or even better.
Ciekce
Posts: 184
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Attempting to implement NNUE into my engine

Post by Ciekce »

Engin wrote: Mon Mar 24, 2025 1:43 am its not about the number of training positions or the number of hidden size, of course the speed should not unnassary drop down a lot for the eval, but its more important the network should learn by reinforcement learning, that means what it gets a score from depth X it should learn that, in the second generation of selfplay it should learn from depth X + 1 again to get more depth searched more accurancy to look forward.

i am training now such way like alpha zero did, selfplay and going to learn deeper and deeper search scores to evaluate position after the leaf nodes in the qsearch later.

Then in generation X it should be beat the HCE version or even better.
literally no one does this
nocturn9x
Posts: 33
Joined: Mon Aug 05, 2024 1:28 pm
Full name: Mattia Giambirtone

Re: Attempting to implement NNUE into my engine

Post by nocturn9x »

Ciekce wrote: Mon Mar 24, 2025 6:17 pm
Engin wrote: Mon Mar 24, 2025 1:43 am its not about the number of training positions or the number of hidden size, of course the speed should not unnassary drop down a lot for the eval, but its more important the network should learn by reinforcement learning, that means what it gets a score from depth X it should learn that, in the second generation of selfplay it should learn from depth X + 1 again to get more depth searched more accurancy to look forward.

i am training now such way like alpha zero did, selfplay and going to learn deeper and deeper search scores to evaluate position after the leaf nodes in the qsearch later.

Then in generation X it should be beat the HCE version or even better.
literally no one does this
Yeah I am also very confused as to what they're talking about
Engin
Posts: 1000
Joined: Mon Jan 05, 2009 7:40 pm
Location: Germany
Full name: Engin Üstün

Re: Attempting to implement NNUE into my engine

Post by Engin »

well, its learning not only evaluation of a position like HCE eval did (material, king safety, open files, bishops pair...), its even learning what is really going on the board like tactical stuff and it can look ahead more deeper then previous search did.
sovaz1997
Posts: 292
Joined: Sun Nov 13, 2016 10:37 am

Re: Attempting to implement NNUE into my engine

Post by sovaz1997 »

lithander wrote: Tue Feb 25, 2025 1:17 pm +126 is a solid improvement! Congratz on the release of 2.6! How did you decide that this is enough for now? Have you hit a plateau or are you just sick of generating more data and want to move on to more interesting things?

I vaguely remember that when many devs switched to NNUE they often reported higher Elo gains then I got with Leorik 3.0. But at the same time my net just didn't gain anymore with more data or a larger neuron count. I think long term it can hurt an engine a lot when new features underperform but at the same time it's hard to know what the peak possible performance even is.
Hi, thank you!
Yeah, just it was really long time to generate and traon on billions positions, because I don't have pc with cuda and used my own trainer on Mac M2

I plan to improve the search when will have time
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.6 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases
sovaz1997
Posts: 292
Joined: Sun Nov 13, 2016 10:37 am

Re: Attempting to implement NNUE into my engine

Post by sovaz1997 »

Engin wrote: Mon Mar 24, 2025 1:43 am
sovaz1997 wrote: Mon Jan 27, 2025 7:56 pm Hello guys! How are you?
I haven't been here for a while. I hope you're all doing well!

Recently I've been wanting to add NNUE estimation to my chess engine.
I decided to start with a simple 768xNx1 network.

I made a self-written trainer on PyTorch, which works. Btw it is very important that the dataset contains only quiet positions. Otherwise, the loss will be very big and the net will play very poorly. Of course I checked, how net works, compare evals on test positions. All works well, with quantization I have same eval values. Also I added splitting dataset to validation and train to be sure, that net approximating correctly.

------

But in the end, it is not possible to catch up with the usual HCE. The gap still is 200-300 Elo points, with any nets, I tried to train 768x[8|16|...|256]x1. They showed approximately the same results in terms of playing strength. And yes, I implemented SIMD instructions (my small networks even work faster than HCE). (because I work on Mac M2, I added NEON instructions only for now).

-------

What I think:
1) because my HCE was tuned by Texel method, it is already strong enough. So I can't beat it with small networks;
2) 40 millions train dataset is small maybe? (Different networks show the same results, because a larger NN requires more positions). I started with ccrl positions, eval it by hce on small nodes (5000-20000), after I made self-play games with 5000 nodes +-50% to have no determination with repeated positions. And don't see big difference between quality of play, my own dataset was even better.

--------

So, guys, would be nice to hear your experience. Maybe I missed something, I don't know. At the moment, I'm generating more positions to test my theory that there aren't enough positions to train bigger nets (768x32 and bigger). Because even 768x8 show not bad results.
its not about the number of training positions or the number of hidden size, of course the speed should not unnassary drop down a lot for the eval, but its more important the network should learn by reinforcement learning, that means what it gets a score from depth X it should learn that, in the second generation of selfplay it should learn from depth X + 1 again to get more depth searched more accurancy to look forward.

i am training now such way like alpha zero did, selfplay and going to learn deeper and deeper search scores to evaluate position after the leaf nodes in the qsearch later.

Then in generation X it should be beat the HCE version or even better.
Thank you!
I'm already resolved this problems :)
v2.6 is stronger than v2.5 on 120-180 elo (depending on architecture)

Just my search didn't want to work correctly with new eval and I adapt some params, also hash has artifacts
Zevra 2 is my chess engine. Binary, source and description here: https://github.com/sovaz1997/Zevra2
Zevra v2.6 is last version of Zevra: https://github.com/sovaz1997/Zevra2/releases