Pytorch NNUE training

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius »

The adventure continues! So far, I've managed to validate that we can import a quantized SF NNUE net into the pytorch float32 representation, and export it back without any loss (even the same sha for the nnue net). So the quantization/dequantization code looks solid. That's done with https://github.com/glinscott/nnue-pytor ... ze.py#L128. Also, validated that the evals from the SF NNUE nets imported to pytorch match the evals from the SF code. So the baseline stuff is looking fairly solid.

Now, for the bad news! The initial nets trained against 10m d5 games were completely disastrous, about -700 elo or so. For comparison, the nodchip trainer gets to about -350 elo or so on the same dataset. The nodchip trainer also has this concept of a factorizer, which generalizes the positions fed into the net. I will need to look deeper into that.

For next steps, going to actually train on a reasonable amount of games - 500m or so, and then see how we are doing.
OfekShochat
Posts: 50
Joined: Thu Oct 15, 2020 10:19 am
Full name: ghostway

Re: Pytorch NNUE training

Post by OfekShochat »

that is great! thanks gladius
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius »

An initial training run by vondele on 256M positions is looking much more promising (also takes about 10 minutes to do an epoch - a pass through the 256M positions, which is great). Getting closer! Note that all training runs so far have been with lambda = 1.0 (train to the evaluation, not the game result).

Code: Select all

Rank Name                      	Elo 	+/-   Games   Score   Draws
   1 master                    	253   	5   16205   81.1%   30.9%
   2 epoch7                    	-6   	4   16205   49.2%   46.3%
   3 epoch6                    	-8   	4   16205   48.8%   46.7%
   4 epoch3                    	-29   	4   16206   45.9%   46.4%
   5 epoch0                   	-190   	5   16205   25.0%   34.1%
elcabesa
Posts: 855
Joined: Sun May 23, 2010 1:32 pm

Re: Pytorch NNUE training

Post by elcabesa »

I'm studying your code and Nodchip trainer.

in nodchip trainer, the position from which the feature list is calculated doesn't seems to be the one stored in the gensfen bin file, but is the position resulting from a qsearch. I think this will help in resolving recapure and other not quiet positions fed in the learner.

hope this can help
David Carteau
Posts: 121
Joined: Sat May 24, 2014 9:09 am
Location: France
Full name: David Carteau

Re: Pytorch NNUE training

Post by David Carteau »

gladius wrote: Sat Nov 14, 2020 10:30 pm An initial training run by vondele on 256M positions is looking much more promising (also takes about 10 minutes to do an epoch - a pass through the 256M positions, which is great).
Wow, it takes my trainer about... 3 days to perform only one epoch (360M positions) ! Great job !
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius »

elcabesa wrote: Sun Nov 15, 2020 9:37 am I'm studying your code and Nodchip trainer.

in nodchip trainer, the position from which the feature list is calculated doesn't seems to be the one stored in the gensfen bin file, but is the position resulting from a qsearch. I think this will help in resolving recapure and other not quiet positions fed in the learner.

hope this can help
Yes, this is a good point. The data generated is using a new option that Sopel added, called `ensure_quiet` that writes only quiet positions out to the file. Training on non-quiet positions is thought to be worse (once we get this working super well, I'm curious how much elo difference it will make though).
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Pytorch NNUE training

Post by Rein Halbersma »

Do any people out here have tried to directly import Keras/Tensorflow or PyTorch trained models (i.e. graphs + weights, not just weights) into C++?

For Keras/Tensorflow there is model.save() on the Python side and then LoadSavedModel on the C++ side to load a graph + weights and link against TF C++ libs. Then you get a Graph Session on which you can call run() to do a single eval() call. For PyTorch there is TorchScript to do similar things.

I wonder if such a route would be competitive compared to hand-written C++ evals that only import the trained weights but re-implement the network graph. At least for Tensorflow there seems to be some virtual function call overhead.
AndrewGrant
Posts: 1753
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: Pytorch NNUE training

Post by AndrewGrant »

gladius wrote: Sat Nov 14, 2020 1:53 am The adventure continues! So far, I've managed to validate that we can import a quantized SF NNUE net into the pytorch float32 representation, and export it back without any loss (even the same sha for the nnue net). So the quantization/dequantization code looks solid. That's done with https://github.com/glinscott/nnue-pytor ... ze.py#L128. Also, validated that the evals from the SF NNUE nets imported to pytorch match the evals from the SF code. So the baseline stuff is looking fairly solid.

Now, for the bad news! The initial nets trained against 10m d5 games were completely disastrous, about -700 elo or so. For comparison, the nodchip trainer gets to about -350 elo or so on the same dataset. The nodchip trainer also has this concept of a factorizer, which generalizes the positions fed into the net. I will need to look deeper into that.

For next steps, going to actually train on a reasonable amount of games - 500m or so, and then see how we are doing.
I can give you a hint, and say that the factorizer is absolutely extremely important, and I don't think networks can even begin to compete with master networks without it.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: Pytorch NNUE training

Post by Daniel Shawul »

Rein Halbersma wrote: Sun Nov 15, 2020 10:04 pm Do any people out here have tried to directly import Keras/Tensorflow or PyTorch trained models (i.e. graphs + weights, not just weights) into C++?

For Keras/Tensorflow there is model.save() on the Python side and then LoadSavedModel on the C++ side to load a graph + weights and link against TF C++ libs. Then you get a Graph Session on which you can call run() to do a single eval() call. For PyTorch there is TorchScript to do similar things.

I wonder if such a route would be competitive compared to hand-written C++ evals that only import the trained weights but re-implement the network graph. At least for Tensorflow there seems to be some virtual function call overhead.
It is not competitive at all -- NNUE via tensorflow C++ inference code is 300x slower than my hand-written eval.
https://github.com/dshawul/Scorpio/comm ... 3a4aaab826
I use this approach for bigger networks like ResNets, and it works fine there even on the CPU.
Usually evaluating the neural network takes much more time than the tensorflow c++ overhead of 20ms/call (which is the killer for NNUE)
gladius
Posts: 568
Joined: Tue Dec 12, 2006 10:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius »

AndrewGrant wrote: Sun Nov 15, 2020 10:31 pm
gladius wrote: Sat Nov 14, 2020 1:53 am The adventure continues! So far, I've managed to validate that we can import a quantized SF NNUE net into the pytorch float32 representation, and export it back without any loss (even the same sha for the nnue net). So the quantization/dequantization code looks solid. That's done with https://github.com/glinscott/nnue-pytor ... ze.py#L128. Also, validated that the evals from the SF NNUE nets imported to pytorch match the evals from the SF code. So the baseline stuff is looking fairly solid.

Now, for the bad news! The initial nets trained against 10m d5 games were completely disastrous, about -700 elo or so. For comparison, the nodchip trainer gets to about -350 elo or so on the same dataset. The nodchip trainer also has this concept of a factorizer, which generalizes the positions fed into the net. I will need to look deeper into that.

For next steps, going to actually train on a reasonable amount of games - 500m or so, and then see how we are doing.
I can give you a hint, and say that the factorizer is absolutely extremely important, and I don't think networks can even begin to compete with master networks without it.
Interesting! One of the experiments I had lined up was disabling the factorizer on the nodchip trainer and seeing how it did. But I’ll take your word for it :). I had already started implementing it, there are some really cool tricks the Shogi folks pulled off - zeroing the initial weights for the factored features, and then just summing them at the end when quantizing. Very insightful technique!

Latest experiments have us about -200 elo from master, so a long way to go, but it’s at least going in the right direction.