One of the most dubious aspects of NNUE for Chess is the use of King-Piece-Square tables. Which is a legacy from Shogi, which is all about King Safety, so that placement relative to the King is far more important than in Chess, while it is also very common that the King is chased all over a crowded board by check-drops. They even have special rules for what happens when the Kings reach the promotion zones.
For Chess this is very unsual, though, and many of the combinations of King and piece location will virtually never occur before the late end-game. This leaves them basically undefined in the training process.
I suspect that for chess it would be much better to drop the King part, and use normal Piece-Square Tables instead, but then supplement those with an equal number Piece-King-relative-square tables (PRT[pieceType][square-kingSquare]). That would give a much better generalization for King-Safety terms. You could always leave a few KPST tables for King locations at or adjacent to the castling destinations, and map all other King locations to the King-relative tables.
How much work is it to train an NNUE?
Moderator: Ras
-
hgm
- Posts: 28419
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
AndrewGrant
- Posts: 1960
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: How much work is it to train an NNUE?
In this case I was not referring to the King-Piece aspect of it, but rather the fact that the nets (during training) have inputs for (King on A1, King on A2, ... King on H8). Those inputs are not apart of the nets (after training), and are collapsed into some of the other weights. However, it is not (and cannot?) be done in a mathematically sound way.hgm wrote: ↑Thu Feb 11, 2021 8:48 pm One of the most dubious aspects of NNUE for Chess is the use of King-Piece-Square tables. Which is a legacy from Shogi, which is all about King Safety, so that placement relative to the King is far more important than in Chess, while it is also very common that the King is chased all over a crowded board by check-drops. They even have special rules for what happens when the Kings reach the promotion zones.
For Chess this is very unsual, though, and many of the combinations of King and piece location will virtually never occur before the late end-game. This leaves them basically undefined in the training process.
I suspect that for chess it would be much better to drop the King part, and use normal Piece-Square Tables instead, but then supplement those with an equal number Piece-King-relative-square tables (PRT[pieceType][square-kingSquare]). That would give a much better generalization for King-Safety terms. You could always leave a few KPST tables for King locations at or adjacent to the castling destinations, and map all other King locations to the King-relative tables.
-
connor_mcmonigle
- Posts: 544
- Joined: Sun Sep 06, 2020 4:40 am
- Full name: Connor McMonigle
Re: How much work is it to train an NNUE?
(To clarify, by "training tools", I was referring both to the facilities for generating data and those for producing models)
Yes, tweaking hyperparameters related to data generation and training can require a great deal of time. Perhaps it's my personal bias creeping in here, but I assign little value to the time involved in tweaking parameters relative to the value I assign to people writing original software/implementing novel ideas.
If you look at the Leela discord, you'll see a fair bit of hyperparameter tweaking as well as a great deal experimentation with novel ideas (micronats, kld thresholding, value repair, SE blocks, memory layers etc.). In fact, I'd argue the majority of the discussion and a significant portion of the allocated time has related to the latter category.
It's also not as if Albert is starting from scratch with his hyperparameter search either. Albert undoubtedly benefits directly (as he is using all of their training tools with little to no modifications) from both the Leela team's (data generation) and SF team's (model production) experimentation with different hyperparameters.
-
dkappe
- Posts: 1632
- Joined: Tue Aug 21, 2018 7:52 pm
- Full name: Dietrich Kappe
Re: How much work is it to train an NNUE?
As the father of “value repair,” (I called them dodgy positions in my Ender experiment) it was an ongoing source of frustration that there was so little movement on innovations. I first mentioned the idea in my wiki towards the end of 2018 (https://github.com/dkappe/leela-chess-w ... ndgame-Net) but encouraged its adoption before then in the discord.connor_mcmonigle wrote: ↑Thu Feb 11, 2021 9:03 pm(To clarify, by "training tools", I was referring both to the facilities for generating data and those for producing models)
Yes, tweaking hyperparameters related to data generation and training can require a great deal of time. Perhaps it's my personal bias creeping in here, but I assign little value to the time involved in tweaking parameters relative to the value I assign to people writing original software/implementing novel ideas.
If you look at the Leela discord, you'll see a fair bit of hyperparameter tweaking as well as a great deal experimentation with novel ideas (micronats, kld thresholding, value repair, SE blocks, memory layers etc.). In fact, I'd argue the majority of the discussion and a significant portion of the allocated time has related to the latter category.
It's also not as if Albert is starting from scratch with his hyperparameter search either. Albert undoubtedly benefits directly (as he is using all of their training tools with little to no modifications) from both the Leela team's (data generation) and SF team's (model production) experimentation with different hyperparameters.
I do value the writing of data generation and training code less because, maybe, I find it easy and not at all mysterious.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
-
Collingwood
- Posts: 89
- Joined: Sat Nov 09, 2019 3:24 pm
- Full name: .
Re: How much work is it to train an NNUE?
But Komodo Dragon is doing everything its own way, isn't it? So that's already a big difference, apart from any advertising.Ozymandias wrote: ↑Thu Feb 11, 2021 5:24 pmBeing close to the top isn't a failure. It all depends on how you portrait your product.
For example, Dragon doesn't advertise as the new #1. Chessbase is doing it at the very top of their website in a hard-to-miss banner:
Now, is it? Doesn't look like it, so they're setting themselves up to fail.
-
Collingwood
- Posts: 89
- Joined: Sat Nov 09, 2019 3:24 pm
- Full name: .
Re: How much work is it to train an NNUE?
The amount of computer time, electricity, etc. is not what matters when you're judging the value added. That is something he seemingly refuses to understand.Modern Times wrote: ↑Thu Feb 11, 2021 10:48 am You vastly underestimate the amount of work Albert and chessbase put into this.
-
AndrewGrant
- Posts: 1960
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: How much work is it to train an NNUE?
I believe there has not been any official commentary about Komodo Dragon's NNUE training process.Collingwood wrote: ↑Thu Feb 11, 2021 11:48 pmBut Komodo Dragon is doing everything its own way, isn't it? So that's already a big difference, apart from any advertising.Ozymandias wrote: ↑Thu Feb 11, 2021 5:24 pmBeing close to the top isn't a failure. It all depends on how you portrait your product.
For example, Dragon doesn't advertise as the new #1. Chessbase is doing it at the very top of their website in a hard-to-miss banner:
Now, is it? Doesn't look like it, so they're setting themselves up to fail.
-
Ferdy
- Posts: 4851
- Joined: Sun Aug 10, 2008 3:15 pm
- Location: Philippines
Re: How much work is it to train an NNUE?
There are 2 cases, one you can train from scratch or from non-nnue data - takes time weeks depending on resources, two do reinforcement that is use a data from existing nnue, takes only 1 day with a decent perf in a 4-core/8-thread machine.Gabor Szots wrote: ↑Thu Feb 11, 2021 8:58 am To develop a chess engine usually takes several monts, years or even a lifetime. But how much work is it to take an existing engine and replace its NNUE with a different one?
In my naive view, to make an NNUE you collect a huge amount of games, determine which features of positions you want to analyze, then let your computer do the rest while you are having your holidays. When you return, a new NNUE is waiting for you to use.
Which means, at least for me, that FF2 has taken Stockfish's development work of years and put in a couple of days work of its own. Which approximates 99 % Stockfish, 1 % ChessBase.
What is the reality?
Stockfish code is free, nnue nowadays can be done in 1 day, couple of days if you want to get more elo, chessbase interface - this is the expensive one.
-
Modern Times
- Posts: 3771
- Joined: Thu Jun 07, 2012 11:02 pm
Re: How much work is it to train an NNUE?
In your opinion.Collingwood wrote: ↑Thu Feb 11, 2021 11:52 pmThe amount of computer time, electricity, etc. is not what matters when you're judging the value added. That is something he seemingly refuses to understand.Modern Times wrote: ↑Thu Feb 11, 2021 10:48 am You vastly underestimate the amount of work Albert and chessbase put into this.
-
Modern Times
- Posts: 3771
- Joined: Thu Jun 07, 2012 11:02 pm
Re: How much work is it to train an NNUE?
But for how long ? The huge success of Stockfish has quite possibly contributed to killing off commercial engines. It is only due to the exception skill and dedication of three people that Komodo continues to exist, but for how much longer. In terms of doing everything its own way, well that is impossible to say as it is closed source. At the very least, they will be studying every Stockfish release to see if any of the ideas work for them.Collingwood wrote: ↑Thu Feb 11, 2021 11:48 pm But Komodo Dragon is doing everything its own way, isn't it? So that's already a big difference, apart from any advertising.