Feedback on NNUE trainer (Keras/Tensorflow)

LeviGibson · Post by **LeviGibson** » Sun Sep 04, 2022 7:20 pm

Hello TalkChess!
I've been trying to create an NNUE trainer for quite some time now, trying to make it tick. I've found very little information online explaining the full training process.
By the way this NNUE is for eggnog-chess-engine(https://github.com/LeviGibson/eggnog-chess-engine).
The training repository is : (https://github.com/LeviGibson/nnue-trainer-2)
Finally I've found a formula that produces decent results, and I would love to get some expert opinions on how to improve it!

One big difference between my networks and other networks is the size. My networks don't use HalfKP, they just have 768 features for the 12 pieces and 64 squares. It also doesn't have a traditional 2-half accumulator, it just has one big layer.
There are also a few extra features such as material and side to move.
The accumulator I use is also really small, only 128 neurons. I do get a better network with a larger accumulator, but the extra processing time makes it not worth it.

I've tried using traditional features with a split accumulator, but for some reason it just doesn't seem to work as well. Is there something special I should be doing for training? some hyperparameters I'm not using?

For the training data, I'm taking PGNs from the lichess database, and analysing them with Stockfish on depth 9 (recommended in the pytorch-nnue repo wiki).
The thing that finally made the NNUE not terrible was selecting less tactical positions. For each position, stockfish analyses them with Multi-pv enabled, and if there are only one or two good moves in a position, it discards it. The code for this can be found here.
Is this how other NNUE trainers pick out quiet positions?

This process of generating training data is rather slow, and I've only been able to generate about 20 million training positions. I'm digging up old laptops from my basement so they can all generate training data. Each time I train with more data, the network gets a whole lot better. How much training data should I ideally have? Also is there a more efficient way to generate training data?

The data generator outputs are stored in a csv format. (FEN, EVALUATION)
The fens are turned into features on the fly during training using python. I've read that a lot of trainers somehow link to C++ and generate features there. Is there a resource that tells me how to do this? I'm sure it would greatly speed up the training process.

Thanks so much for reading through this, I'm having so much fun working on this project.

Graham Banks · Post by **Graham Banks** » Mon Sep 05, 2022 12:37 am

How strong in CCRL terms do you estimate EggNog 4 to be?

dkappe · Post by **dkappe** » Mon Sep 05, 2022 1:28 am

A few things:

1. You may want the game result. This allows you to use the result to modify the eval.
2. You may want the move made in the game. If it was a check or capture, you can filter out that position.

Lots of luck.

LeviGibson · Post by **LeviGibson** » Mon Sep 05, 2022 2:13 am

Graham Banks wrote: ↑Mon Sep 05, 2022 12:37 am How strong in CCRL terms do you estimate EggNog 4 to be?

Quite weak at the moment. Somewhere around 2400. I haven't done extensive tests since I implemented my own networks.
Here it is on lichess: https://lichess.org/@/eggnog-chess-engine

Graham Banks · Post by **Graham Banks** » Mon Sep 05, 2022 2:15 am

LeviGibson wrote: ↑Mon Sep 05, 2022 2:13 am
Graham Banks wrote: ↑Mon Sep 05, 2022 12:37 am How strong in CCRL terms do you estimate EggNog 4 to be?
Quite weak at the moment. Somewhere around 2400. I haven't done extensive tests since I implemented my own networks.
Here it is on lichess: https://lichess.org/@/eggnog-chess-engine

Thanks for replying. You can ignore my email.

chrisw · Post by **chrisw** » Mon Sep 05, 2022 11:06 am

LeviGibson wrote: ↑Sun Sep 04, 2022 7:20 pm Hello TalkChess!
I've been trying to create an NNUE trainer for quite some time now, trying to make it tick. I've found very little information online explaining the full training process.
By the way this NNUE is for eggnog-chess-engine(https://github.com/LeviGibson/eggnog-chess-engine).
The training repository is : (https://github.com/LeviGibson/nnue-trainer-2)
Finally I've found a formula that produces decent results, and I would love to get some expert opinions on how to improve it!

One big difference between my networks and other networks is the size. My networks don't use HalfKP, they just have 768 features for the 12 pieces and 64 squares. It also doesn't have a traditional 2-half accumulator, it just has one big layer.

that’s fine, start off small and un/complex and add expansion ideas later

There are also a few extra features such as material and side to move.

the NN should be able to work those out for itself. As and when you do a standard(?) 2-part accumulator, stm is basically included. Mtrl may be helpful

The accumulator I use is also really small, only 128 neurons. I do get a better network with a larger accumulator, but the extra processing time makes it not worth it.

Doubling up the neurons will store more knowledge, but there’s not much point at the stage you only have a few million training positions. It also increases training time which sounds significant in your case.

I've tried using traditional features with a split accumulator, but for some reason it just doesn't seem to work as well. Is there something special I should be doing for training? some hyperparameters I'm not using?

For the training data, I'm taking PGNs from the lichess database, and analysing them with Stockfish on depth 9 (recommended in the pytorch-nnue repo wiki).
The thing that finally made the NNUE not terrible was selecting less tactical positions. For each position, stockfish analyses them with Multi-pv enabled, and if there are only one or two good moves in a position, it discards it. The code for this can be found here.
Is this how other NNUE trainers pick out quiet positions?

Seems most published trainers just junk positions where best move is a capture.

This process of generating training data is rather slow, and I've only been able to generate about 20 million training positions. I'm digging up old laptops from my basement so they can all generate training data. Each time I train with more data, the network gets a whole lot better. How much training data should I ideally have? Also is there a more efficient way to generate training data?

a few tens of millions will give you as a minimum a non-dumb network. A few billions is better. As to efficient generation, you could produce positions from self play, or skip the process or download ready produced sets. Position generation and labelling is a significant bottleneck.

The data generator outputs are stored in a csv format. (FEN, EVALUATION)
The fens are turned into features on the fly during training using python. I've read that a lot of trainers somehow link to C++ and generate features there. Is there a resource that tells me how to do this? I'm sure it would greatly speed up the training process.

Nnue pytorch documentation on GitHub. If you’re proficient at C and know how to interface C code with Python, it’s a non-trivial task to write code for this. Doing it in Python is dog-slow and another major bottleneck.

Thanks so much for reading through this, I'm having so much fun working on this project.

Cool! You’ll probably find it possible to have a life during data processing downtime!

Feedback on NNUE trainer (Keras/Tensorflow)

Feedback on NNUE trainer (Keras/Tensorflow)

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Re: Feedback on NNUE trainer (Keras/Tensorflow)