I'm implementing a neural network that should replace the evaluation of my engine.
I've made a dataset of 200M fens with associated the score of a 4 depth search of my engine.
For the optimization algorithm I have used Adagrad and Gradient descent, the loss function is the sum squared error.
I have trained the net with these methods and I've always get a big error.
The net trained in this way gives a mean error of 200cp that is very high.
How can I improve the training?
I have tried to shuffle the positions in different iterations but it gives a very little improvement. Adagrad compared to a simple Gradient descent gives similar results.
Do you have any suggestion to lower the error of the neural network?
Train a neural network evaluation
Moderators: hgm, Rebel, chrisw
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Train a neural network evaluation
More information about the network architecture and hyper-parameters (learning rate, etc) might provide some clues.
Just a quick stab, but depth 4 does not seem very deep relative to what the SF-NNUE Discord posts talk about.
It might not be enough improvement from your current eval.
Just a quick stab, but depth 4 does not seem very deep relative to what the SF-NNUE Discord posts talk about.
It might not be enough improvement from your current eval.
-
- Posts: 70
- Joined: Tue Dec 31, 2019 2:52 am
- Full name: Kieren Pearson
Re: Train a neural network evaluation
I’m currently doing the same thing so here’s a few pointers.
First, use someone else’s dataset before using your own. It’s just another thing that can go wrong and actually getting quiet positions that are good for training is a pretty complex problem. I recommend the zurichess set with 725K positions. I used it successfully to do texel tuning and it’s working well to train nets ATM to I know that set works.
If adagrad and gradient descent are giving really similar results, start with GD until you start getting good results. No need to overcomplicate until you’re sure the training code is big free.
Are you using a sigmoid transformation (look up texel if your unfamiliar)? I think it’s probably better because the error isn’t dominated by extreme values.
Most likely though the issue is the training data being used. As mentioned a depth of 4 may not be high enough. I would try fewer positions and go for a deeper search. Unless your network is quite large / deep I would start simple with way less positions.
First, use someone else’s dataset before using your own. It’s just another thing that can go wrong and actually getting quiet positions that are good for training is a pretty complex problem. I recommend the zurichess set with 725K positions. I used it successfully to do texel tuning and it’s working well to train nets ATM to I know that set works.
If adagrad and gradient descent are giving really similar results, start with GD until you start getting good results. No need to overcomplicate until you’re sure the training code is big free.
Are you using a sigmoid transformation (look up texel if your unfamiliar)? I think it’s probably better because the error isn’t dominated by extreme values.
Most likely though the issue is the training data being used. As mentioned a depth of 4 may not be high enough. I would try fewer positions and go for a deeper search. Unless your network is quite large / deep I would start simple with way less positions.
-
- Posts: 130
- Joined: Fri Jun 17, 2016 4:14 pm
- Location: Colorado, USA
- Full name: John Stanback
Re: Train a neural network evaluation
I am also trying to implement a NN eval for Wasp. I have an EPD file of about 30M positions from Wasp vs other engines with depth ~15 scores and game result. I randomly choose positions from this file and then do a 25 ply "playout" where only one move is randomly chosen and played. At each position I back-propogate the error between the standard Wasp eval and the NN eval to tweak the weights. I think the crazy random positions are necessary since the NN will need to train the weights for things like a WN on a8 which will rarely happen in a real game. After I train the NN with about 500M of these random positions it seems to help a bit to train with maybe 20-50M of the real game positions using a lower LR. I'm using a sigmoid activation and back-propogate the error as per a tutorial I found. I don't know what learn-rate is typical, but I've mostly been using a learn rate of about 5e-4 for the random positions and 1e-4 for the real positions.
My simple NN of two layers with 4 and 2 nodes seems to be stuck at a bit over 100 Elo worse that the normal Wasp eval. The inputs to this NN are similar to the features that are in the normal Wasp eval (ie PST's, mobility counts, count of attacks near king, passed pawns on rank N than can/cannot advance), so it seems like it should be able to match the strength of Wasp. It trains really fast, maybe 500K positions/sec, so it takes only about 20 minutes to train a NN from scratch, MUCH faster than my terribly slow code for doing Texel tuning. For some reason it doesn't help at all to increase the network size. It's been a lot of fun to experiment with, but I don't know how much more progress I'll make. It would be nice to find a way to train with something other than the normal Wasp eval as the target.
John
My simple NN of two layers with 4 and 2 nodes seems to be stuck at a bit over 100 Elo worse that the normal Wasp eval. The inputs to this NN are similar to the features that are in the normal Wasp eval (ie PST's, mobility counts, count of attacks near king, passed pawns on rank N than can/cannot advance), so it seems like it should be able to match the strength of Wasp. It trains really fast, maybe 500K positions/sec, so it takes only about 20 minutes to train a NN from scratch, MUCH faster than my terribly slow code for doing Texel tuning. For some reason it doesn't help at all to increase the network size. It's been a lot of fun to experiment with, but I don't know how much more progress I'll make. It would be nice to find a way to train with something other than the normal Wasp eval as the target.
John
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Train a neural network evaluation
I'm trying a similar network to stockfish nnue with the only difference that I have added the castling rights as input.brianr wrote: ↑Tue Sep 01, 2020 3:21 pm More information about the network architecture and hyper-parameters (learning rate, etc) might provide some clues.
Just a quick stab, but depth 4 does not seem very deep relative to what the SF-NNUE Discord posts talk about.
It might not be enough improvement from your current eval.
The learning rate is 0.0001, I have tried various and it seems the best.
I could try a deeper search but the problem is that the error is quite high and after 200M positions the error doesn't seem to lower a lot.
I've tried also different starting weights with small difference in the training results.
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: Train a neural network evaluation
Is unclear the range of centipawn values your training and test sets have, but, if you were training on win probability (0.0 to 1.0 range) then a completely untrained (random) net is going to come back with an average “error” of about 0.3 which could well translate to your 200 cp. So it’s entirely possible the training process is completely broken somewhere, somehow - meaning check first that there’s any training at all taking place before looking for improvements. Some architectures simply are not able to unravel any sense at all out of data presented, or there may just be a bug in the training code. Suggest you substitute in a known working input format, a known working layer architecture, known working training rate and so on and if that works try substituting in your own way to organise the NN.Fabio Gobbato wrote: ↑Tue Sep 01, 2020 2:25 pm I'm implementing a neural network that should replace the evaluation of my engine.
I've made a dataset of 200M fens with associated the score of a 4 depth search of my engine.
For the optimization algorithm I have used Adagrad and Gradient descent, the loss function is the sum squared error.
I have trained the net with these methods and I've always get a big error.
The net trained in this way gives a mean error of 200cp that is very high.
How can I improve the training?
I have tried to shuffle the positions in different iterations but it gives a very little improvement. Adagrad compared to a simple Gradient descent gives similar results.
Do you have any suggestion to lower the error of the neural network?
Ah! Edit, you’re using similar to NNUE, so disregard the above suggestion. Are you able to measure the “error” of the functional NNUE? It would at least give you some sort of target to be aiming at.
-
- Posts: 536
- Joined: Thu Mar 09, 2006 3:01 pm
Re: Train a neural network evaluation
I am very familiar with Leela type networks and far less so with the NNUE nets.
That said, your LR seems extremely low for initial training (although no batch size was mentioned).
I suggest asking in the SF-NNUE Discord but I'm not sure how joining that works.
That said, your LR seems extremely low for initial training (although no batch size was mentioned).
I suggest asking in the SF-NNUE Discord but I'm not sure how joining that works.
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Train a neural network evaluation
I don't think there is a bug because if I train the net with only 10 positions the error go down to 0. The problem comes when I train with 200M positions.chrisw wrote: ↑Tue Sep 01, 2020 6:13 pm Is unclear the range of centipawn values your training and test sets have, but, if you were training on win probability (0.0 to 1.0 range) then a completely untrained (random) net is going to come back with an average “error” of about 0.3 which could well translate to your 200 cp. So it’s entirely possible the training process is completely broken somewhere, somehow - meaning check first that there’s any training at all taking place before looking for improvements. Some architectures simply are not able to unravel any sense at all out of data presented, or there may just be a bug in the training code. Suggest you substitute in a known working input format, a known working layer architecture, known working training rate and so on and if that works try substituting in your own way to organise the NN.
Ah! Edit, you’re using similar to NNUE, so disregard the above suggestion. Are you able to measure the “error” of the functional NNUE? It would at least give you some sort of target to be aiming at.
I have tried also with win probability and the error drops to 10% that is more or less a pawn, and it's difficult to get further improvements.
-
- Posts: 4319
- Joined: Tue Apr 03, 2012 4:28 pm
Re: Train a neural network evaluation
10% is quite goodFabio Gobbato wrote: ↑Tue Sep 01, 2020 8:29 pmI don't think there is a bug because if I train the net with only 10 positions the error go down to 0. The problem comes when I train with 200M positions.chrisw wrote: ↑Tue Sep 01, 2020 6:13 pm Is unclear the range of centipawn values your training and test sets have, but, if you were training on win probability (0.0 to 1.0 range) then a completely untrained (random) net is going to come back with an average “error” of about 0.3 which could well translate to your 200 cp. So it’s entirely possible the training process is completely broken somewhere, somehow - meaning check first that there’s any training at all taking place before looking for improvements. Some architectures simply are not able to unravel any sense at all out of data presented, or there may just be a bug in the training code. Suggest you substitute in a known working input format, a known working layer architecture, known working training rate and so on and if that works try substituting in your own way to organise the NN.
Ah! Edit, you’re using similar to NNUE, so disregard the above suggestion. Are you able to measure the “error” of the functional NNUE? It would at least give you some sort of target to be aiming at.
I have tried also with win probability and the error drops to 10% that is more or less a pawn, and it's difficult to get further improvements.
-
- Posts: 217
- Joined: Fri Apr 11, 2014 10:45 am
- Full name: Fabio Gobbato
Re: Train a neural network evaluation
I update the weights after every sample so the batch size is 1, I'm not sure if it could be a problem.brianr wrote: ↑Tue Sep 01, 2020 6:39 pm I am very familiar with Leela type networks and far less so with the NNUE nets.
That said, your LR seems extremely low for initial training (although no batch size was mentioned).
I suggest asking in the SF-NNUE Discord but I'm not sure how joining that works.