How much work is it to train an NNUE?

hgm · Post by **hgm** » Thu Feb 11, 2021 8:48 pm

One of the most dubious aspects of NNUE for Chess is the use of King-Piece-Square tables. Which is a legacy from Shogi, which is all about King Safety, so that placement relative to the King is far more important than in Chess, while it is also very common that the King is chased all over a crowded board by check-drops. They even have special rules for what happens when the Kings reach the promotion zones.

For Chess this is very unsual, though, and many of the combinations of King and piece location will virtually never occur before the late end-game. This leaves them basically undefined in the training process.

I suspect that for chess it would be much better to drop the King part, and use normal Piece-Square Tables instead, but then supplement those with an equal number Piece-King-relative-square tables (PRT[pieceType][square-kingSquare]). That would give a much better generalization for King-Safety terms. You could always leave a few KPST tables for King locations at or adjacent to the castling destinations, and map all other King locations to the King-relative tables.

AndrewGrant · Post by **AndrewGrant** » Thu Feb 11, 2021 8:57 pm

hgm wrote: ↑Thu Feb 11, 2021 8:48 pm One of the most dubious aspects of NNUE for Chess is the use of King-Piece-Square tables. Which is a legacy from Shogi, which is all about King Safety, so that placement relative to the King is far more important than in Chess, while it is also very common that the King is chased all over a crowded board by check-drops. They even have special rules for what happens when the Kings reach the promotion zones.

For Chess this is very unsual, though, and many of the combinations of King and piece location will virtually never occur before the late end-game. This leaves them basically undefined in the training process.

I suspect that for chess it would be much better to drop the King part, and use normal Piece-Square Tables instead, but then supplement those with an equal number Piece-King-relative-square tables (PRT[pieceType][square-kingSquare]). That would give a much better generalization for King-Safety terms. You could always leave a few KPST tables for King locations at or adjacent to the castling destinations, and map all other King locations to the King-relative tables.

In this case I was not referring to the King-Piece aspect of it, but rather the fact that the nets (during training) have inputs for (King on A1, King on A2, ... King on H8). Those inputs are not apart of the nets (after training), and are collapsed into some of the other weights. However, it is not (and cannot?) be done in a mathematically sound way.

connor_mcmonigle · Post by **connor_mcmonigle** » Thu Feb 11, 2021 9:03 pm

dkappe wrote: ↑Thu Feb 11, 2021 7:38 pm ...
Conclusion:
I’ve found training — the tweaking and the failures — to be the hardest and most time consuming part. For evidence, just go through the last few years of the leela chess discord.

(To clarify, by "training tools", I was referring both to the facilities for generating data and those for producing models)

Yes, tweaking hyperparameters related to data generation and training can require a great deal of time. Perhaps it's my personal bias creeping in here, but I assign little value to the time involved in tweaking parameters relative to the value I assign to people writing original software/implementing novel ideas.

If you look at the Leela discord, you'll see a fair bit of hyperparameter tweaking as well as a great deal experimentation with novel ideas (micronats, kld thresholding, value repair, SE blocks, memory layers etc.). In fact, I'd argue the majority of the discussion and a significant portion of the allocated time has related to the latter category.

It's also not as if Albert is starting from scratch with his hyperparameter search either. Albert undoubtedly benefits directly (as he is using all of their training tools with little to no modifications) from both the Leela team's (data generation) and SF team's (model production) experimentation with different hyperparameters.

dkappe · Post by **dkappe** » Thu Feb 11, 2021 9:33 pm

connor_mcmonigle wrote: ↑Thu Feb 11, 2021 9:03 pm
dkappe wrote: ↑Thu Feb 11, 2021 7:38 pm ...
Conclusion:
I’ve found training — the tweaking and the failures — to be the hardest and most time consuming part. For evidence, just go through the last few years of the leela chess discord.
(To clarify, by "training tools", I was referring both to the facilities for generating data and those for producing models)

Yes, tweaking hyperparameters related to data generation and training can require a great deal of time. Perhaps it's my personal bias creeping in here, but I assign little value to the time involved in tweaking parameters relative to the value I assign to people writing original software/implementing novel ideas.

If you look at the Leela discord, you'll see a fair bit of hyperparameter tweaking as well as a great deal experimentation with novel ideas (micronats, kld thresholding, value repair, SE blocks, memory layers etc.). In fact, I'd argue the majority of the discussion and a significant portion of the allocated time has related to the latter category.

It's also not as if Albert is starting from scratch with his hyperparameter search either. Albert undoubtedly benefits directly (as he is using all of their training tools with little to no modifications) from both the Leela team's (data generation) and SF team's (model production) experimentation with different hyperparameters.

As the father of “value repair,” (I called them dodgy positions in my Ender experiment) it was an ongoing source of frustration that there was so little movement on innovations. I first mentioned the idea in my wiki towards the end of 2018 (https://github.com/dkappe/leela-chess-w ... ndgame-Net) but encouraged its adoption before then in the discord.

I do value the writing of data generation and training code less because, maybe, I find it easy and not at all mysterious.

Collingwood · Post by **Collingwood** » Thu Feb 11, 2021 11:48 pm

Ozymandias wrote: ↑Thu Feb 11, 2021 5:24 pm
Milos wrote: ↑Thu Feb 11, 2021 3:44 pm Alberto Plata (if you believe his claims) basically wasted a humongous amount of resources just to end up with a result that is obviously subpar to current SFdev even using larger net. And his "contribution" to changing the net architecture is totally trivial.
Being close to the top isn't a failure. It all depends on how you portrait your product.

For example, Dragon doesn't advertise as the new #1. Chessbase is doing it at the very top of their website in a hard-to-miss banner:

Now, is it? Doesn't look like it, so they're setting themselves up to fail.

But Komodo Dragon is doing everything its own way, isn't it? So that's already a big difference, apart from any advertising.

Collingwood · Post by **Collingwood** » Thu Feb 11, 2021 11:52 pm

Modern Times wrote: ↑Thu Feb 11, 2021 10:48 am You vastly underestimate the amount of work Albert and chessbase put into this.

The amount of computer time, electricity, etc. is not what matters when you're judging the value added. That is something he seemingly refuses to understand.

AndrewGrant · Post by **AndrewGrant** » Fri Feb 12, 2021 12:22 am

Collingwood wrote: ↑Thu Feb 11, 2021 11:48 pm
Ozymandias wrote: ↑Thu Feb 11, 2021 5:24 pm
Milos wrote: ↑Thu Feb 11, 2021 3:44 pm Alberto Plata (if you believe his claims) basically wasted a humongous amount of resources just to end up with a result that is obviously subpar to current SFdev even using larger net. And his "contribution" to changing the net architecture is totally trivial.
Being close to the top isn't a failure. It all depends on how you portrait your product.

For example, Dragon doesn't advertise as the new #1. Chessbase is doing it at the very top of their website in a hard-to-miss banner:

Now, is it? Doesn't look like it, so they're setting themselves up to fail.
But Komodo Dragon is doing everything its own way, isn't it? So that's already a big difference, apart from any advertising.

I believe there has not been any official commentary about Komodo Dragon's NNUE training process.

Ferdy · Post by **Ferdy** » Fri Feb 12, 2021 1:25 am

Gabor Szots wrote: ↑Thu Feb 11, 2021 8:58 am To develop a chess engine usually takes several monts, years or even a lifetime. But how much work is it to take an existing engine and replace its NNUE with a different one?
In my naive view, to make an NNUE you collect a huge amount of games, determine which features of positions you want to analyze, then let your computer do the rest while you are having your holidays. When you return, a new NNUE is waiting for you to use.
Which means, at least for me, that FF2 has taken Stockfish's development work of years and put in a couple of days work of its own. Which approximates 99 % Stockfish, 1 % ChessBase.

What is the reality?

There are 2 cases, one you can train from scratch or from non-nnue data - takes time weeks depending on resources, two do reinforcement that is use a data from existing nnue, takes only 1 day with a decent perf in a 4-core/8-thread machine.

Stockfish code is free, nnue nowadays can be done in 1 day, couple of days if you want to get more elo, chessbase interface - this is the expensive one.

Modern Times · Post by **Modern Times** » Fri Feb 12, 2021 1:30 am

Collingwood wrote: ↑Thu Feb 11, 2021 11:52 pm
Modern Times wrote: ↑Thu Feb 11, 2021 10:48 am You vastly underestimate the amount of work Albert and chessbase put into this.
The amount of computer time, electricity, etc. is not what matters when you're judging the value added. That is something he seemingly refuses to understand.

In your opinion.

Modern Times · Post by **Modern Times** » Fri Feb 12, 2021 1:36 am

Collingwood wrote: ↑Thu Feb 11, 2021 11:48 pm But Komodo Dragon is doing everything its own way, isn't it? So that's already a big difference, apart from any advertising.

But for how long ? The huge success of Stockfish has quite possibly contributed to killing off commercial engines. It is only due to the exception skill and dedication of three people that Komodo continues to exist, but for how much longer. In terms of doing everything its own way, well that is impossible to say as it is closed source. At the very least, they will be studying every Stockfish release to see if any of the ideas work for them.

How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?

Re: How much work is it to train an NNUE?