Doubtful. You already said you use a Qsearch so by definition tactical positions won’t be registering in the training data.
Attempting to implement NNUE into my engine
Moderator: Ras
-
- Posts: 4607
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Midi-Pyrénées
- Full name: Christopher Whittington
Re: Attempting to implement NNUE into my engine
-
- Posts: 1000
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: Attempting to implement NNUE into my engine
I did not said that it won't....
How do you want to know that is not registered ?
yes, but it will register in the neural network what is going tacticaly in the positions too, not only the evalution of the position.
Why did the Stockfish team trained the NNUE in some depths like 8 or 9 or even 12 then ?
I do not even know how the neural network this can learn, but it seems to me that it learning everything what is possible to know in any position what is going on.
-
- Posts: 1000
- Joined: Mon Jan 05, 2009 7:40 pm
- Location: Germany
- Full name: Engin Üstün
Re: Attempting to implement NNUE into my engine
yeah, thats a big problem for me too.sovaz1997 wrote: ↑Mon Jan 27, 2025 7:56 pm Hello guys! How are you?
I haven't been here for a while. I hope you're all doing well!
Recently I've been wanting to add NNUE estimation to my chess engine.
I decided to start with a simple 768xNx1 network.
I made a self-written trainer on PyTorch, which works. Btw it is very important that the dataset contains only quiet positions. Otherwise, the loss will be very big and the net will play very poorly. Of course I checked, how net works, compare evals on test positions. All works well, with quantization I have same eval values. Also I added splitting dataset to validation and train to be sure, that net approximating correctly.
------
But in the end, it is not possible to catch up with the usual HCE. The gap still is 200-300 Elo points, with any nets, I tried to train 768x[8|16|...|256]x1. They showed approximately the same results in terms of playing strength. And yes, I implemented SIMD instructions (my small networks even work faster than HCE). (because I work on Mac M2, I added NEON instructions only for now).
-------
What I think:
1) because my HCE was tuned by Texel method, it is already strong enough. So I can't beat it with small networks;
2) 40 millions train dataset is small maybe? (Different networks show the same results, because a larger NN requires more positions). I started with ccrl positions, eval it by hce on small nodes (5000-20000), after I made self-play games with 5000 nodes +-50% to have no determination with repeated positions. And don't see big difference between quality of play, my own dataset was even better.
--------
So, guys, would be nice to hear your experience. Maybe I missed something, I don't know. At the moment, I'm generating more positions to test my theory that there aren't enough positions to train bigger nets (768x32 and bigger). Because even 768x8 show not bad results.
if you try very mini networks its not learning, is very weak playing, but its very fast, or you set bigger networks and its learning fast but then if its playing a game its extremly slow down the search.
The whole efforts are for nothing.
Until we must find a between speed and enough to knowledge for the neural network.
I mean even the Stockfish NNUE is to much high with 2x 256 x 32 x 32 x 1 at the end. i try even a smaller net and that works too.