NN Accuracy Against Own Training Data

towforce · Post by **towforce** » Thu Sep 07, 2023 2:15 pm

There's no reason why anyone should know this - I'm asking just in case somebody does.

An NN is trained against a large number of positions and these positions' ratings.

If the trained NN were tested against its own training data, how accurate would its output be?

This could be answered in several different ways, any of which would be useful - for example:

1. "60% of the evaluations are within 0.5 of the evaluations in the training data"

2. "The average difference between the training data evaluations and the NN's evaluations is 0.6, and the standard deviation is 0.4"

Any information, or even any guesses, would be welcome!

towforce · Post by **towforce** » Sun Sep 10, 2023 9:49 am

Interesting that nobody is either willing or able to even make a guess.

In some ways, this is the very essence of choosing an NN architecture, or deciding how many training positions to use - but it looks as if developers are only concerning themselves with elo rating: in the long term, having this information would boost elo more quickly IMO.

Given that, normally, there are very many more training positions than there is capacity in the NN to encode them, it seems likely that the NN will, in a large number of positions, get a very bad score against its own training data (assuming that the NN doesn't manage to find simplifying algorithms that cause it to have a very good "understanding" of chess - and the available evidence strongly suggests that this isn't happening).

I am guessing that in training, unusual positions get very little reinforcement, whereas "normal" types of position, examples of which are numerous in the training data, will get a lot of reinforcement. In these types of positions, I would expect the NN's evaluation to be "reasonably close" to the training data.

dkappe · Post by **dkappe** » Sun Sep 10, 2023 6:19 pm

The nnue and leela (and all other) training frameworks use a loss function that in a rough sense measures the accuracy. There’s a large body of literature on this which I would encourage you to read.

One of many reasons that high accuracy can be bad is overfitting where the network matches the training data really well at the cost of data it hasn’t seen. There are many cases (data filtering of in check or capturing positions) where the loss function gets worse but the elo improves. The situation is even more complicated with leela networks where there may be a trade off between the different heads — value, policy, etc..

So, accuracy is not the end all and be all of training, especially given that the hyperparameters can give very different loss function curves. The reason people don’t bang on about loss is that it’s not important except in the most general way, I.e. it starts high and gradually goes down. Again, I’d encourage you to read up on the subject.

towforce · Post by **towforce** » Sun Sep 10, 2023 7:33 pm

dkappe wrote: ↑Sun Sep 10, 2023 6:19 pm The nnue and leela (and all other) training frameworks use a loss function that in a rough sense measures the accuracy. There’s a large body of literature on this which I would encourage you to read.

One of many reasons that high accuracy can be bad is overfitting where the network matches the training data really well at the cost of data it hasn’t seen. There are many cases (data filtering of in check or capturing positions) where the loss function gets worse but the elo improves. The situation is even more complicated with leela networks where there may be a trade off between the different heads — value, policy, etc..

So, accuracy is not the end all and be all of training, especially given that the hyperparameters can give very different loss function curves. The reason people don’t bang on about loss is that it’s not important except in the most general way, I.e. it starts high and gradually goes down. Again, I’d encourage you to read up on the subject.

Thank you - that's a helpful answer!

With the caveat that it's the only article I've read on the subject, there's a good article on loss here - link.

You've pointed out that it's risky to overfit the data. At this point, I should say what I would want:

1. A "good enough" fit of as much of the training data as possible

2. The smallest size of NN (or other algorithm) possible

A small algorithm (too small to have "knowledge" of every position in the training data) that fits most of the data "well enough" would necessarily have captured some important chess knowledge that applies to a large number of types of positions.

If I was trying to do this using a deep-learning NN, my first thought would be:

1. Train an NN against all the data

2. Discard the positions from the data for which the NN gives an unacceptably poor evaluation

2a. Come back and do something with these positions later (maybe more NNs?)

3. Pick a smaller NN architecture, take the smaller training data set, and...

4. Go back to step 1

This might be a prohibitively time-consuming process, but at the end of it, you'd have a set of small, fast set of NNs, each of which encodes a position type concisely and sufficiently accurately.

The final step would be to build and train an NN to select which concise NN to use for a given position.

Another idea (no idea how practical this would be) would be to shape the training as you go (a bit like an artist shaping a piece of pottery): train the NN against a smallish number of position until it can evaluate them all well, add more positions, if this results in a poor performance against the original positions, train against them some more. Basically, keep adding positions, but keep forcing the NN to be able to evaluate the positions it already knew well enough (maybe a school teaching children is a bit like this!). This would ultimately only be possible if:

1. There are simple patterns to be found which work well over a large number of positions

2. It's possible to get an NN to move towards finding these patterns (I suspect that the NN would keep on finding local maxima, and fail to get to the global patterns (assuming they exist))

Of course, I think there might be a better way without using NNs. I would love to know what proportion of NNUE's or LC0's evaluations against positions from their own training data is "unacceptably bad".

JoAnnP38 · Post by **JoAnnP38** » Mon Sep 11, 2023 2:15 am

I am curious about the answer to this as well. After tuning my HCE I measure its accuracy over the training data and to date I still have not achieved 52% accuracy, but I am getting closer. If accuracy is correlated with strength, I'll be curious to hear this same stat for a NNUE.

NN Accuracy Against Own Training Data

NN Accuracy Against Own Training Data

Re: NN Accuracy Against Own Training Data

Re: NN Accuracy Against Own Training Data

Re: NN Accuracy Against Own Training Data

Re: NN Accuracy Against Own Training Data