I've been trying to create an NNUE trainer for quite some time now, trying to make it tick. I've found very little information online explaining the full training process.
By the way this NNUE is for eggnog-chess-engine(https://github.com/LeviGibson/eggnog-chess-engine).
The training repository is : (https://github.com/LeviGibson/nnue-trainer-2)
Finally I've found a formula that produces decent results, and I would love to get some expert opinions on how to improve it!
One big difference between my networks and other networks is the size. My networks don't use HalfKP, they just have 768 features for the 12 pieces and 64 squares. It also doesn't have a traditional 2-half accumulator, it just has one big layer.
There are also a few extra features such as material and side to move.
The accumulator I use is also really small, only 128 neurons. I do get a better network with a larger accumulator, but the extra processing time makes it not worth it.
I've tried using traditional features with a split accumulator, but for some reason it just doesn't seem to work as well. Is there something special I should be doing for training? some hyperparameters I'm not using?
For the training data, I'm taking PGNs from the lichess database, and analysing them with Stockfish on depth 9 (recommended in the pytorch-nnue repo wiki).
The thing that finally made the NNUE not terrible was selecting less tactical positions. For each position, stockfish analyses them with Multi-pv enabled, and if there are only one or two good moves in a position, it discards it. The code for this can be found here.
Is this how other NNUE trainers pick out quiet positions?
This process of generating training data is rather slow, and I've only been able to generate about 20 million training positions. I'm digging up old laptops from my basement so they can all generate training data. Each time I train with more data, the network gets a whole lot better. How much training data should I ideally have? Also is there a more efficient way to generate training data?
The data generator outputs are stored in a csv format. (FEN, EVALUATION)
The fens are turned into features on the fly during training using python. I've read that a lot of trainers somehow link to C++ and generate features there. Is there a resource that tells me how to do this? I'm sure it would greatly speed up the training process.
Thanks so much for reading through this, I'm having so much fun working on this project.
