lithander wrote: ↑Sat Feb 17, 2024 12:17 pm
Maybe you shouldn't start with neural networks right away but with "texel tuning" some piece-square tables from labeled positions. It uses the same general principles like gradient descent but without hidden layers you don't need backpropagation. It's simpler and thus easier to understand and write from scratch!
It's also easily and quickly computable on the CPU. Many engines (including Leorik) do that kind of tuning for their HCE. Skipping that step makes it harder to really understand NNUE evals imo.
Generally, you'll wanted to minimize the error between the prediction of your network/HCE-eval and the labels over the entire training dataset. So you typically convert from the linear eval scale (centipawns) into winning propabilities.
Code: Select all
double MeanSquareError(List<Data> data, float scalingCoefficient)
{
double squaredErrorSum = 0;
foreach (Data entry in data)
{
var eval = Evaluation(entry.Position);
float error = entry.Result - Sigmoid(value, scalingCoefficient);
squaredErrorSum += error * error;
}
return squaredErrorSum / data.Count;
}
You do can do that with sigmoid functions like this:
Code: Select all
float Sigmoid(float eval, float scalingCoefficient)
{
//maps eval given in centipawn to winning propabilities [-1..1]
return 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
}
Yeah I knew about Texel Tuning, but I just thought it'd be way more fun to tackle NNUE: I understand most of how it all works, I just found
this series by "The Coding Train" and watched the sections about back-prop, and I'm actually really close to getting it working (He explains things like he's explaining to a 5 year old, and that was just enough to get it into my head lol)
That's interesting....
So then the error is applied along every PST value, just like you apply gradients to weights?
And I might be getting this wrong, here's what I got from this:
during training, the NN's output is fed through sigmoid (but offset so it's between -1 and 1 instead of 0 and 1) and is a win probability, because your training it off the self-play data, which gives the game's outcome, instead of an individual evaluation
(Simple enough, you're training the network to predict the outcome of the game)
But then in the engine, the sigmoid function is either removed and the output is multiplied by a constant scaling value, or mapped from win probability to centipawns by doing the opposite of this Sigmoid function you gave:
Code: Select all
float Sigmoid(float eval, float scalingCoefficient)
{
//maps eval given in centipawn to winning propabilities [-1..1]
return 2 / (1 + Math.Exp(-(eval / scalingCoefficient))) - 1;
}
Thanks again for all your help! Sorry if that's just completely wrong and I completely misunderstood haha