Train a neural network evaluation

brianr · Post by **brianr** » Wed Sep 02, 2020 2:51 pm

I have never seen a batch size less than 1K.

The learning rate is by batch size.
So, for 1,024 batch size a LR=.0001 is about .1 for just one sample, which is close to your value.

There are rapid learning approaches with very high initial LRs.
See:

Most Leela training is done with a stepped LR schedule, for example:
.1 for 80K steps
.01 for 60K
.001 for 40K
.0001 for 20K
for a total of 200K steps.
Sometimes the initial LR is .2 or other values.
This schedule trains a reasonable Leela net, but several million steps are needed for a strong net.
All of this is specific to supervised learning (SL), not reinforcement learning (RL) with Leela type nets, which are quite different from NNUE nets.

Experiment with larger batch sizes initially and then reduce the value.
With an initial LR=.0001 and a 1K batch size progress will be quite slow.

jstanback · Post by **jstanback** » Wed Sep 02, 2020 2:59 pm

I am also updating the weights on every position, so I guess a batch size of one. Do LC0 and SFNNUE start by using random positions for tuning or some other method to get enough variety such that all the weights get tuned correctly?

John

brianr · Post by **brianr** » Wed Sep 02, 2020 3:35 pm

I don't know about SF-NNUE and the like.

For the main Leela project (which is RL), until quite recently all games are from the starting position with initially random weights.
Then, the nets are trained with random samples from the games, which of course are over-represented by the starting position and likely first moves. There have been some experiments using games with opening books; I'm not sure how that is going.

Variety is created in the games played by various technical parameters.
Sample positions are randomly chosen after skipping N positions, with the default value of SKIP=32.

There is some evidence that Leela type nets train better with weaker play early and moving towards stronger play as the nets get "smarter".

David Carteau · Post by **David Carteau** » Sun Sep 06, 2020 2:40 pm

Fabio Gobbato wrote: ↑Tue Sep 01, 2020 2:25 pm (...)
The net trained in this way gives a mean error of 200cp that is very high.
(...)

I have exactly the same here, while trying to implement my own NNUE trainer (see here) ! As far as I am concerned, I'm using as target the static eval from nn-82215d0fd0df.nnue net and my own dataset, built from CCRL games (and already used to tune v0.7).

That's not easy !

Fabio Gobbato · Post by **Fabio Gobbato** » Fri Oct 23, 2020 5:38 pm

How do you choose the positions for the training? Is it better to use only quiet positions?

Karlo Bala · Post by **Karlo Bala** » Sat Oct 24, 2020 3:33 pm

How good is Zurichess set with 725K positions?
I immediately spotted positions 6 and 10 which are IMO incorrectly marked as draws.

[d]8/1R6/1p6/3pkp2/P6p/1P1K3P/4r1P1/8 b - - c9

[d]r4rk1/1p2ppbp/pP4p1/q7/1n2P3/1Q6/PP1BNPPP/R4RK1 b - -

Train a neural network evaluation

Re: Train a neural network evaluation

Re: Train a neural network evaluation

Re: Train a neural network evaluation

Re: Train a neural network evaluation

Re: Train a neural network evaluation

Re: Train a neural network evaluation