Feedback on NNUE trainer (Keras/Tensorflow)

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Dann Corbit, Harvey Williamson

LeviGibson
Posts: 11
Joined: Sat Aug 07, 2021 3:41 pm
Full name: Levi Gibson

Feedback on NNUE trainer (Keras/Tensorflow)

Post by LeviGibson »

Hello TalkChess!
I've been trying to create an NNUE trainer for quite some time now, trying to make it tick. I've found very little information online explaining the full training process.
By the way this NNUE is for eggnog-chess-engine(https://github.com/LeviGibson/eggnog-chess-engine).
The training repository is : (https://github.com/LeviGibson/nnue-trainer-2)
Finally I've found a formula that produces decent results, and I would love to get some expert opinions on how to improve it!

One big difference between my networks and other networks is the size. My networks don't use HalfKP, they just have 768 features for the 12 pieces and 64 squares. It also doesn't have a traditional 2-half accumulator, it just has one big layer.
There are also a few extra features such as material and side to move.
The accumulator I use is also really small, only 128 neurons. I do get a better network with a larger accumulator, but the extra processing time makes it not worth it.

I've tried using traditional features with a split accumulator, but for some reason it just doesn't seem to work as well. Is there something special I should be doing for training? some hyperparameters I'm not using?


For the training data, I'm taking PGNs from the lichess database, and analysing them with Stockfish on depth 9 (recommended in the pytorch-nnue repo wiki).
The thing that finally made the NNUE not terrible was selecting less tactical positions. For each position, stockfish analyses them with Multi-pv enabled, and if there are only one or two good moves in a position, it discards it. The code for this can be found here.
Is this how other NNUE trainers pick out quiet positions?

This process of generating training data is rather slow, and I've only been able to generate about 20 million training positions. I'm digging up old laptops from my basement so they can all generate training data. Each time I train with more data, the network gets a whole lot better. How much training data should I ideally have? Also is there a more efficient way to generate training data?


The data generator outputs are stored in a csv format. (FEN, EVALUATION)
The fens are turned into features on the fly during training using python. I've read that a lot of trainers somehow link to C++ and generate features there. Is there a resource that tells me how to do this? I'm sure it would greatly speed up the training process.

Thanks so much for reading through this, I'm having so much fun working on this project. :D
User avatar
Graham Banks
Posts: 41179
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Post by Graham Banks »

How strong in CCRL terms do you estimate EggNog 4 to be?
gbanksnz at gmail.com
dkappe
Posts: 1620
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Post by dkappe »

A few things:

1. You may want the game result. This allows you to use the result to modify the eval.
2. You may want the move made in the game. If it was a check or capture, you can filter out that position.

Lots of luck.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
LeviGibson
Posts: 11
Joined: Sat Aug 07, 2021 3:41 pm
Full name: Levi Gibson

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Post by LeviGibson »

Graham Banks wrote: Mon Sep 05, 2022 12:37 am How strong in CCRL terms do you estimate EggNog 4 to be?
Quite weak at the moment. Somewhere around 2400. I haven't done extensive tests since I implemented my own networks.
Here it is on lichess: https://lichess.org/@/eggnog-chess-engine
User avatar
Graham Banks
Posts: 41179
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Post by Graham Banks »

LeviGibson wrote: Mon Sep 05, 2022 2:13 am
Graham Banks wrote: Mon Sep 05, 2022 12:37 am How strong in CCRL terms do you estimate EggNog 4 to be?
Quite weak at the moment. Somewhere around 2400. I haven't done extensive tests since I implemented my own networks.
Here it is on lichess: https://lichess.org/@/eggnog-chess-engine
Thanks for replying. You can ignore my email. :)
gbanksnz at gmail.com
chrisw
Posts: 4290
Joined: Tue Apr 03, 2012 4:28 pm

Re: Feedback on NNUE trainer (Keras/Tensorflow)

Post by chrisw »

LeviGibson wrote: Sun Sep 04, 2022 7:20 pm Hello TalkChess!
I've been trying to create an NNUE trainer for quite some time now, trying to make it tick. I've found very little information online explaining the full training process.
By the way this NNUE is for eggnog-chess-engine(https://github.com/LeviGibson/eggnog-chess-engine).
The training repository is : (https://github.com/LeviGibson/nnue-trainer-2)
Finally I've found a formula that produces decent results, and I would love to get some expert opinions on how to improve it!

One big difference between my networks and other networks is the size. My networks don't use HalfKP, they just have 768 features for the 12 pieces and 64 squares. It also doesn't have a traditional 2-half accumulator, it just has one big layer.

that’s fine, start off small and un/complex and add expansion ideas later

There are also a few extra features such as material and side to move.

the NN should be able to work those out for itself. As and when you do a standard(?) 2-part accumulator, stm is basically included. Mtrl may be helpful

The accumulator I use is also really small, only 128 neurons. I do get a better network with a larger accumulator, but the extra processing time makes it not worth it.


Doubling up the neurons will store more knowledge, but there’s not much point at the stage you only have a few million training positions. It also increases training time which sounds significant in your case.



I've tried using traditional features with a split accumulator, but for some reason it just doesn't seem to work as well. Is there something special I should be doing for training? some hyperparameters I'm not using?


For the training data, I'm taking PGNs from the lichess database, and analysing them with Stockfish on depth 9 (recommended in the pytorch-nnue repo wiki).
The thing that finally made the NNUE not terrible was selecting less tactical positions. For each position, stockfish analyses them with Multi-pv enabled, and if there are only one or two good moves in a position, it discards it. The code for this can be found here.
Is this how other NNUE trainers pick out quiet positions?

Seems most published trainers just junk positions where best move is a capture.


This process of generating training data is rather slow, and I've only been able to generate about 20 million training positions. I'm digging up old laptops from my basement so they can all generate training data. Each time I train with more data, the network gets a whole lot better. How much training data should I ideally have? Also is there a more efficient way to generate training data?

a few tens of millions will give you as a minimum a non-dumb network. A few billions is better. As to efficient generation, you could produce positions from self play, or skip the process or download ready produced sets. Position generation and labelling is a significant bottleneck.


The data generator outputs are stored in a csv format. (FEN, EVALUATION)
The fens are turned into features on the fly during training using python. I've read that a lot of trainers somehow link to C++ and generate features there. Is there a resource that tells me how to do this? I'm sure it would greatly speed up the training process.


Nnue pytorch documentation on GitHub. If you’re proficient at C and know how to interface C code with Python, it’s a non-trivial task to write code for this. Doing it in Python is dog-slow and another major bottleneck.


Thanks so much for reading through this, I'm having so much fun working on this project. :D
Cool! You’ll probably find it possible to have a life during data processing downtime!