How do NNUEs self train?

eboatwright · Post by **eboatwright** » Wed Feb 14, 2024 3:15 am

Hello,

I've been trying to learn as much as I can about NNUE before starting to implement it into my engine
I know that some people train the neural network to "replicate" evaluations from the HCE (or other NNUEs), but I don't understand how an engine can "self-train"

I wouldn't think you can have the randomly initialized network create it's own pairs of (position, evaluation) training data, because it doesn't know anything yet, and just applying the game's result to the loss function seems unlikely, so how does it work?

~ Thanks in advance

syzygy · Post by **syzygy** » Wed Feb 14, 2024 9:56 pm

eboatwright wrote: ↑Wed Feb 14, 2024 3:15 am I've been trying to learn as much as I can about NNUE before starting to implement it into my engine
I know that some people train the neural network to "replicate" evaluations from the HCE (or other NNUEs), but I don't understand how an engine can "self-train"

I wouldn't think you can have the randomly initialized network create it's own pairs of (position, evaluation) training data, because it doesn't know anything yet, and just applying the game's result to the loss function seems unlikely, so how does it work?

It will learn from the outcome of the games. I guess it takes a while before a randomly initiated network starts to get a clue, though.

hgm · Post by **hgm** » Wed Feb 14, 2024 10:22 pm

Even random movers have a better chance of accidentally checkmating the opponent when they have more and stronger material. So if you award their wins by upping the evaluation of the positions in that game of the side that won, on average these will contain a material advantage, and it will learn to value that. This will make it already play better, making the material advantage change less rapidly, so that it starts to correlate even better with the game result.

eboatwright · Post by **eboatwright** » Wed Feb 14, 2024 10:36 pm

Ohh so you do just apply the result of the game, that's interesting, should any extra data be saved between games, or just the tuned weights?

lithander · Post by **lithander** » Thu Feb 15, 2024 1:57 pm

Using a random network as evaluation function means you basically have no idea what a good position is. But the search still knows the rules of chess so only legal moves can be chosen. So after a sequence of basically random moves you stumble on position that have no legal moves. They are won for white or won for black or drawn. That's all you need to bootstrap the process.

0.) Start with a random network as your evaluation function
1.) Play thousands of selfplay matches and record them
2.) Create a list of labeled positions; each move in a match creates a position and the outcome of the game is the label
3.) Use millions of (position, label) data pairs to train a network that predicts the outcome (label) based on a position
4.) The resulting network should now do a little better as your evaluation function
5.) Go back to Step 1

eboatwright wrote: ↑Wed Feb 14, 2024 10:36 pm Ohh so you do just apply the result of the game, that's interesting, should any extra data be saved between games, or just the tuned weights?

What you save is a bunch of PGNs or whatever format you chose to record millions of matches. Then filter that data and create millions of labeled positions in the format that your trainer expects. It's all pretty compute heavy but nothing a modern PC with a ~100GB of free disk space can't handle.

eboatwright · Post by **eboatwright** » Thu Feb 15, 2024 4:26 pm

lithander wrote: ↑Thu Feb 15, 2024 1:57 pm Using a random network as evaluation function means you basically have no idea what a good position is. But the search still knows the rules of chess so only legal moves can be chosen. So after a sequence of basically random moves you stumble on position that have no legal moves. They are won for white or won for black or drawn. That's all you need to bootstrap the process.

0.) Start with a random network as your evaluation function
1.) Play thousands of selfplay matches and record them
2.) Create a list of labeled positions; each move in a match creates a position and the outcome of the game is the label
3.) Use millions of (position, label) data pairs to train a network that predicts the outcome (label) based on a position
4.) The resulting network should now do a little better as your evaluation function
5.) Go back to Step 1

eboatwright wrote: ↑Wed Feb 14, 2024 10:36 pm Ohh so you do just apply the result of the game, that's interesting, should any extra data be saved between games, or just the tuned weights?
What you save is a bunch of PGNs or whatever format you chose to record millions of matches. Then filter that data and create millions of labeled positions in the format that your trainer expects. It's all pretty compute heavy but nothing a modern PC with a ~100GB of free disk space can't handle.

Thank you so much!! That's awesome, I've been working on an implementation for a few days now: I've got a decently fast network that takes 768 (piece, square) inputs -> 256 -> 1, and basic gradient-descent implemented.

But with my basic implementation (calculating the delta for every weight individually) I calculated it would take over a month just to train one epoch of 5 mill positions!

(I got ~19 mill (FEN, eval) pairs from a Lichess database for starting out)
So I'm currently stuck on trying to implement back-propagation

jdart · Post by **jdart** » Thu Feb 15, 2024 4:57 pm

This whole process is very computation-intensive. I have something like 100 cores over several machines to do the training data generation.

For model training I am using a fork of the Stockfish python trainer - that runs on the GPU. I have a RTX 3080 for that. That is reasonably fast

eboatwright · Post by **eboatwright** » Thu Feb 15, 2024 5:14 pm

jdart wrote: ↑Thu Feb 15, 2024 4:57 pm This whole process is very computation-intensive. I have something like 100 cores over several machines to do the training data generation.

For model training I am using a fork of the Stockfish python trainer - that runs on the GPU. I have a RTX 3080 for that. That is reasonably fast

Yeah my computational power is definitely lacking, but I'm doing this to learn, so I'd like to write all the training code myself.
My lack of back-propagation is definitely a huge problem, still scratching my head on how to get that implemented, but we'll see hahaha

eboatwright · Post by **eboatwright** » Thu Feb 15, 2024 5:52 pm

Alright, so I've pushed what I have so far onto my dev branch:
https://github.com/eboatwright/Maxwell/ ... rc/nnue.rs

I'm gonna start by training a network from this data by Lichess: https://database.lichess.org/#evals
Although eventually I do want to fully self-train the final network

eboatwright · Post by **eboatwright** » Thu Feb 15, 2024 8:11 pm

Although after thinking it through some more, I think I might just end up learning PyTorch to train the network, all this derivative calculating and back-propagation is going waaaayyy over my head

How do NNUEs self train?

How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?

Re: How do NNUEs self train?