Pytorch NNUE training

Discussion of chess software programming and technical issues.

Moderators: Harvey Williamson, Dann Corbit, hgm

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Posts: 561
Joined: Tue Dec 12, 2006 9:10 am
Full name: Gary Linscott

Re: Pytorch NNUE training

Post by gladius » Sun Nov 22, 2020 3:17 am

We've been trying a bunch of experiments, but not a huge amount of forward progress. We are at about -80 elo from master, but still a long ways away. This is all still training on d5 data though, so I'm hopeful that higher depth data will be a big win (unfortunately, it takes a really long time to generate 2B fens, which is how many we are training on for d5!).

Made a few discoveries - the scaling factor of 600 in the loss function was actually adjusted in the nodchip trainer to be `1.0 / PawnValueEg / 4.0 * 2.302585092994046 std::log(10.0)`, or roughly 361. We tried training with that, and it didn't help much unfortunately.

Also measured the loss function we were using, converting the best SF net to pytorch format, and then running different lambda values over it. Discovered that with the scale factor at 600, the SF best net had worse (higher) loss overall, which is a bad sign, as it means our loss function is not representative of ELO (although it is correlated). So a big open question is if we can find a loss function that maps well to ELO.

Just today, noobpwnftw discovered a bug in the game outcomes recorded in the training data, so any training runs with lambda != 1.0 are a bit suspect.

Sopel is adding support for multi-GPU training, so we are hopefully going to be going through the data at lightspeed soon!

Post Reply