For now, I have no gpu, so I run on cpu only.
batch_size is 256, 8 workers => it takes more than 1h for 10Msfens
Unless "step" in the tensorboard view is not equivalent to sfens ...
So question is, do I need a gpu or is there something else going wrong ?
Pytorch NNUE training
Moderators: hgm, Rebel, chrisw
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
-
- Posts: 121
- Joined: Sat May 24, 2014 9:09 am
- Location: France
- Full name: David Carteau
Re: Pytorch NNUE training
Hi Vivien,
Until now, I was using only CPU to train my networks, with an average speed of 3 days for processing 360 million positions (i.e. my full set of training data, i.e. an "epoch"). I've just adapted my pytorch script to use GPU and I'll tell you what is the performance gain. As I understood, "step" is performed to compute the gradients at the end of a "batch" (i.e. a subset of your training data).
Regards,
David
Download Orion :
https://www.orionchess.com/
https://www.orionchess.com/
-
- Posts: 121
- Joined: Sat May 24, 2014 9:09 am
- Location: France
- Full name: David Carteau
Re: Pytorch NNUE training
I had recent success while training a NNUE-style network for my Orion NNUE experiment with a batch size of... 256k ! Maybe, my choice was far to be optimal... I found this size by trying to have decent convergence while setting both 'batch size' and 'learning rate' parameters. I will try later other combinations. There's a lot of randomness (and try/error attempts) for fixing these values, having in mind that you can also choose the optimisation algorithm (SGD, Adam, etc.)gladius wrote: ↑Mon Nov 16, 2020 10:25 pmBatch size is 8192.AndrewGrant wrote: ↑Mon Nov 16, 2020 9:38 pmWhat was the batch size for this? That makes a big difference. Batchsize=1, that is damn fast. Batchsize=1M, that is damn slow.
Download Orion :
https://www.orionchess.com/
https://www.orionchess.com/
-
- Posts: 533
- Joined: Sun Sep 06, 2020 4:40 am
- Full name: Connor McMonigle
Re: Pytorch NNUE training
I've configured tensorboard such that the x axis of the loss graph corresponds to number of packed FENs observed. My training code is slow, but, even with a modest GPU, you'd get far superior performance. I would say you need a GPU
-
- Posts: 855
- Joined: Sun May 23, 2010 1:32 pm
Re: Pytorch NNUE training
I try to reask this question, what is factorizer? after looking inside the code it seems a "quantizer". Am I wrong?elcabesa wrote: ↑Mon Nov 16, 2020 7:05 pmsorry but what is a factorizer?gladius wrote: ↑Mon Nov 16, 2020 5:47 pmCool, thanks for the tip! Well, will be interesting to compare to a "standard" run.AndrewGrant wrote: ↑Mon Nov 16, 2020 8:02 amI implemented a factorizer in what I'm doing. I never looked at SF for it, used a sort of "clean room design" method where Seer author told me about the idea, and had me draw some diagrams. But at the end of the day, any sort of factorization seems like an easy winner. Maybe multiple factorizers wins. May be absolute distance wins.gladius wrote: ↑Mon Nov 16, 2020 7:11 am Interesting! One of the experiments I had lined up was disabling the factorizer on the nodchip trainer and seeing how it did. But I’ll take your word for it . I had already started implementing it, there are some really cool tricks the Shogi folks pulled off - zeroing the initial weights for the factored features, and then just summing them at the end when quantizing. Very insightful technique!
Latest experiments have us about -200 elo from master, so a long way to go, but it’s at least going in the right direction.
Not sure. At some point I might attempt to find out, as I can "drag and drop" factorizers into my code atm with a few minutes effort.
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: Pytorch NNUE training
This was on a v100. Very impressive speed for your trainer! The GPU path is not exploiting sparsity for backprop, so we take a hit there.AndrewGrant wrote: ↑Tue Nov 17, 2020 12:49 amInteresting. Just about the same speed as me, running on the CPU with very many threads in a C program. Can I ask what your system is?gladius wrote: ↑Mon Nov 16, 2020 10:25 pmBatch size is 8192.AndrewGrant wrote: ↑Mon Nov 16, 2020 9:38 pmWhat was the batch size for this? That makes a big difference. Batchsize=1, that is damn fast. Batchsize=1M, that is damn slow.
-
- Posts: 568
- Joined: Tue Dec 12, 2006 10:10 am
- Full name: Gary Linscott
Re: Pytorch NNUE training
It's a method to augment/generalize the data the net is training on. Since halfkp splits out piece positions by king position, the net loses some generalization. The factorizer adds those "simple" features (eg. pure piece position, not dependent on king) during training, and then sums up all the values of the relevant features when exporting the net. The cool part is that you only need them while training - not at inference time, which is a really nice speed win.elcabesa wrote: ↑Tue Nov 17, 2020 1:05 pmI try to reask this question, what is factorizer? after looking inside the code it seems a "quantizer". Am I wrong?elcabesa wrote: ↑Mon Nov 16, 2020 7:05 pmsorry but what is a factorizer?gladius wrote: ↑Mon Nov 16, 2020 5:47 pmCool, thanks for the tip! Well, will be interesting to compare to a "standard" run.AndrewGrant wrote: ↑Mon Nov 16, 2020 8:02 amI implemented a factorizer in what I'm doing. I never looked at SF for it, used a sort of "clean room design" method where Seer author told me about the idea, and had me draw some diagrams. But at the end of the day, any sort of factorization seems like an easy winner. Maybe multiple factorizers wins. May be absolute distance wins.gladius wrote: ↑Mon Nov 16, 2020 7:11 am Interesting! One of the experiments I had lined up was disabling the factorizer on the nodchip trainer and seeing how it did. But I’ll take your word for it . I had already started implementing it, there are some really cool tricks the Shogi folks pulled off - zeroing the initial weights for the factored features, and then just summing them at the end when quantizing. Very insightful technique!
Latest experiments have us about -200 elo from master, so a long way to go, but it’s at least going in the right direction.
Not sure. At some point I might attempt to find out, as I can "drag and drop" factorizers into my code atm with a few minutes effort.
-
- Posts: 1871
- Joined: Sat Nov 25, 2017 2:28 pm
- Location: France
Re: Pytorch NNUE training
Can you say a little more how it is done theoretically please without affecting the net topology being used ?gladius wrote: ↑Tue Nov 17, 2020 4:40 pmIt's a method to augment/generalize the data the net is training on. Since halfkp splits out piece positions by king position, the net loses some generalization. The factorizer adds those "simple" features (eg. pure piece position, not dependent on king) during training, and then sums up all the values of the relevant features when exporting the net. The cool part is that you only need them while training - not at inference time, which is a really nice speed win.elcabesa wrote: ↑Tue Nov 17, 2020 1:05 pmI try to reask this question, what is factorizer? after looking inside the code it seems a "quantizer". Am I wrong?elcabesa wrote: ↑Mon Nov 16, 2020 7:05 pmsorry but what is a factorizer?gladius wrote: ↑Mon Nov 16, 2020 5:47 pmCool, thanks for the tip! Well, will be interesting to compare to a "standard" run.AndrewGrant wrote: ↑Mon Nov 16, 2020 8:02 amI implemented a factorizer in what I'm doing. I never looked at SF for it, used a sort of "clean room design" method where Seer author told me about the idea, and had me draw some diagrams. But at the end of the day, any sort of factorization seems like an easy winner. Maybe multiple factorizers wins. May be absolute distance wins.gladius wrote: ↑Mon Nov 16, 2020 7:11 am Interesting! One of the experiments I had lined up was disabling the factorizer on the nodchip trainer and seeing how it did. But I’ll take your word for it . I had already started implementing it, there are some really cool tricks the Shogi folks pulled off - zeroing the initial weights for the factored features, and then just summing them at the end when quantizing. Very insightful technique!
Latest experiments have us about -200 elo from master, so a long way to go, but it’s at least going in the right direction.
Not sure. At some point I might attempt to find out, as I can "drag and drop" factorizers into my code atm with a few minutes effort.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Pytorch NNUE training
Isn't 256000 a really big mini-batch size ? That almost turns mini-batch gradient descent to batch gradient descent algorithm.David Carteau wrote: ↑Tue Nov 17, 2020 9:33 amI had recent success while training a NNUE-style network for my Orion NNUE experiment with a batch size of... 256k ! Maybe, my choice was far to be optimal... I found this size by trying to have decent convergence while setting both 'batch size' and 'learning rate' parameters. I will try later other combinations. There's a lot of randomness (and try/error attempts) for fixing these values, having in mind that you can also choose the optimisation algorithm (SGD, Adam, etc.)gladius wrote: ↑Mon Nov 16, 2020 10:25 pmBatch size is 8192.AndrewGrant wrote: ↑Mon Nov 16, 2020 9:38 pmWhat was the batch size for this? That makes a big difference. Batchsize=1, that is damn fast. Batchsize=1M, that is damn slow.
-
- Posts: 1756
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: Pytorch NNUE training
Gary,
I'm trying to output my networks with an SF format. You have a couple different versions going in in your repo, but do https://github.com/glinscott/nnue-pytor ... rialize.py and https://github.com/glinscott/nnue-pytor ... r/model.py correspond to what would be matching Nodchip's outputs?
I've trained a network using the same loss as your model, and loss looks good. However, loading the weights into Stockfish is failing. I can confirm that I am not having an off by one issue; I can print weights/biases as I output them, and then they match printing as I read them into Stockfish. Out of desperation, I tried all possible variations of transforming the matrices, to no avail. The results of games are -infinite elo to the updated Network, which implies a failure to load or quantize the weights correctly.
Can you confirm that those two .py files produce working Networks? Perhaps I need a fresh set of eyes, but porting to this format should have been trivial.
I'm trying to output my networks with an SF format. You have a couple different versions going in in your repo, but do https://github.com/glinscott/nnue-pytor ... rialize.py and https://github.com/glinscott/nnue-pytor ... r/model.py correspond to what would be matching Nodchip's outputs?
I've trained a network using the same loss as your model, and loss looks good. However, loading the weights into Stockfish is failing. I can confirm that I am not having an off by one issue; I can print weights/biases as I output them, and then they match printing as I read them into Stockfish. Out of desperation, I tried all possible variations of transforming the matrices, to no avail. The results of games are -infinite elo to the updated Network, which implies a failure to load or quantize the weights correctly.
Can you confirm that those two .py files produce working Networks? Perhaps I need a fresh set of eyes, but porting to this format should have been trivial.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )