Pytorch NNUE training

Discussion of chess software programming and technical issues.

Moderators: Harvey Williamson, Dann Corbit, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
gladius
Posts: 565
Joined: Tue Dec 12, 2006 9:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius » Sat Nov 14, 2020 12:53 am

The adventure continues! So far, I've managed to validate that we can import a quantized SF NNUE net into the pytorch float32 representation, and export it back without any loss (even the same sha for the nnue net). So the quantization/dequantization code looks solid. That's done with https://github.com/glinscott/nnue-pytor ... ze.py#L128. Also, validated that the evals from the SF NNUE nets imported to pytorch match the evals from the SF code. So the baseline stuff is looking fairly solid.

Now, for the bad news! The initial nets trained against 10m d5 games were completely disastrous, about -700 elo or so. For comparison, the nodchip trainer gets to about -350 elo or so on the same dataset. The nodchip trainer also has this concept of a factorizer, which generalizes the positions fed into the net. I will need to look deeper into that.

For next steps, going to actually train on a reasonable amount of games - 500m or so, and then see how we are doing.

OfekShochat
Posts: 20
Joined: Thu Oct 15, 2020 8:19 am
Full name: Ofek Shochat

Re: Pytorch NNUE training

Post by OfekShochat » Sat Nov 14, 2020 5:27 am

that is great! thanks gladius

gladius
Posts: 565
Joined: Tue Dec 12, 2006 9:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius » Sat Nov 14, 2020 9:30 pm

An initial training run by vondele on 256M positions is looking much more promising (also takes about 10 minutes to do an epoch - a pass through the 256M positions, which is great). Getting closer! Note that all training runs so far have been with lambda = 1.0 (train to the evaluation, not the game result).

Code: Select all

Rank Name                      	Elo 	+/-   Games   Score   Draws
   1 master                    	253   	5   16205   81.1%   30.9%
   2 epoch7                    	-6   	4   16205   49.2%   46.3%
   3 epoch6                    	-8   	4   16205   48.8%   46.7%
   4 epoch3                    	-29   	4   16206   45.9%   46.4%
   5 epoch0                   	-190   	5   16205   25.0%   34.1%

elcabesa
Posts: 848
Joined: Sun May 23, 2010 11:32 am
Contact:

Re: Pytorch NNUE training

Post by elcabesa » Sun Nov 15, 2020 8:37 am

I'm studying your code and Nodchip trainer.

in nodchip trainer, the position from which the feature list is calculated doesn't seems to be the one stored in the gensfen bin file, but is the position resulting from a qsearch. I think this will help in resolving recapure and other not quiet positions fed in the learner.

hope this can help

David Carteau
Posts: 85
Joined: Sat May 24, 2014 7:09 am
Location: France
Full name: David Carteau
Contact:

Re: Pytorch NNUE training

Post by David Carteau » Sun Nov 15, 2020 9:07 am

gladius wrote:
Sat Nov 14, 2020 9:30 pm
An initial training run by vondele on 256M positions is looking much more promising (also takes about 10 minutes to do an epoch - a pass through the 256M positions, which is great).
Wow, it takes my trainer about... 3 days to perform only one epoch (360M positions) ! Great job !

gladius
Posts: 565
Joined: Tue Dec 12, 2006 9:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius » Sun Nov 15, 2020 4:17 pm

elcabesa wrote:
Sun Nov 15, 2020 8:37 am
I'm studying your code and Nodchip trainer.

in nodchip trainer, the position from which the feature list is calculated doesn't seems to be the one stored in the gensfen bin file, but is the position resulting from a qsearch. I think this will help in resolving recapure and other not quiet positions fed in the learner.

hope this can help
Yes, this is a good point. The data generated is using a new option that Sopel added, called `ensure_quiet` that writes only quiet positions out to the file. Training on non-quiet positions is thought to be worse (once we get this working super well, I'm curious how much elo difference it will make though).

Rein Halbersma
Posts: 698
Joined: Tue May 22, 2007 9:13 am

Re: Pytorch NNUE training

Post by Rein Halbersma » Sun Nov 15, 2020 9:04 pm

Do any people out here have tried to directly import Keras/Tensorflow or PyTorch trained models (i.e. graphs + weights, not just weights) into C++?

For Keras/Tensorflow there is model.save() on the Python side and then LoadSavedModel on the C++ side to load a graph + weights and link against TF C++ libs. Then you get a Graph Session on which you can call run() to do a single eval() call. For PyTorch there is TorchScript to do similar things.

I wonder if such a route would be competitive compared to hand-written C++ evals that only import the trained weights but re-implement the network graph. At least for Tensorflow there seems to be some virtual function call overhead.

AndrewGrant
Posts: 876
Joined: Tue Apr 19, 2016 4:08 am
Location: U.S.A
Full name: Andrew Grant
Contact:

Re: Pytorch NNUE training

Post by AndrewGrant » Sun Nov 15, 2020 9:31 pm

gladius wrote:
Sat Nov 14, 2020 12:53 am
The adventure continues! So far, I've managed to validate that we can import a quantized SF NNUE net into the pytorch float32 representation, and export it back without any loss (even the same sha for the nnue net). So the quantization/dequantization code looks solid. That's done with https://github.com/glinscott/nnue-pytor ... ze.py#L128. Also, validated that the evals from the SF NNUE nets imported to pytorch match the evals from the SF code. So the baseline stuff is looking fairly solid.

Now, for the bad news! The initial nets trained against 10m d5 games were completely disastrous, about -700 elo or so. For comparison, the nodchip trainer gets to about -350 elo or so on the same dataset. The nodchip trainer also has this concept of a factorizer, which generalizes the positions fed into the net. I will need to look deeper into that.

For next steps, going to actually train on a reasonable amount of games - 500m or so, and then see how we are doing.
I can give you a hint, and say that the factorizer is absolutely extremely important, and I don't think networks can even begin to compete with master networks without it.

Daniel Shawul
Posts: 4066
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

Re: Pytorch NNUE training

Post by Daniel Shawul » Sun Nov 15, 2020 9:33 pm

Rein Halbersma wrote:
Sun Nov 15, 2020 9:04 pm
Do any people out here have tried to directly import Keras/Tensorflow or PyTorch trained models (i.e. graphs + weights, not just weights) into C++?

For Keras/Tensorflow there is model.save() on the Python side and then LoadSavedModel on the C++ side to load a graph + weights and link against TF C++ libs. Then you get a Graph Session on which you can call run() to do a single eval() call. For PyTorch there is TorchScript to do similar things.

I wonder if such a route would be competitive compared to hand-written C++ evals that only import the trained weights but re-implement the network graph. At least for Tensorflow there seems to be some virtual function call overhead.
It is not competitive at all -- NNUE via tensorflow C++ inference code is 300x slower than my hand-written eval.
https://github.com/dshawul/Scorpio/comm ... 3a4aaab826
I use this approach for bigger networks like ResNets, and it works fine there even on the CPU.
Usually evaluating the neural network takes much more time than the tensorflow c++ overhead of 20ms/call (which is the killer for NNUE)

gladius
Posts: 565
Joined: Tue Dec 12, 2006 9:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius » Mon Nov 16, 2020 6:11 am

AndrewGrant wrote:
Sun Nov 15, 2020 9:31 pm
gladius wrote:
Sat Nov 14, 2020 12:53 am
The adventure continues! So far, I've managed to validate that we can import a quantized SF NNUE net into the pytorch float32 representation, and export it back without any loss (even the same sha for the nnue net). So the quantization/dequantization code looks solid. That's done with https://github.com/glinscott/nnue-pytor ... ze.py#L128. Also, validated that the evals from the SF NNUE nets imported to pytorch match the evals from the SF code. So the baseline stuff is looking fairly solid.

Now, for the bad news! The initial nets trained against 10m d5 games were completely disastrous, about -700 elo or so. For comparison, the nodchip trainer gets to about -350 elo or so on the same dataset. The nodchip trainer also has this concept of a factorizer, which generalizes the positions fed into the net. I will need to look deeper into that.

For next steps, going to actually train on a reasonable amount of games - 500m or so, and then see how we are doing.
I can give you a hint, and say that the factorizer is absolutely extremely important, and I don't think networks can even begin to compete with master networks without it.
Interesting! One of the experiments I had lined up was disabling the factorizer on the nodchip trainer and seeing how it did. But I’ll take your word for it :). I had already started implementing it, there are some really cool tricks the Shogi folks pulled off - zeroing the initial weights for the factored features, and then just summing them at the end when quantizing. Very insightful technique!

Latest experiments have us about -200 elo from master, so a long way to go, but it’s at least going in the right direction.

Post Reply