NNUE + Pawn-King Network

alvinypeng · Post by **alvinypeng** » Fri Apr 22, 2022 9:25 am

Most good evaluations take pawn-king structure into account. Since pawn-king structure isn't very dynamic, some clever people figured out you could cache pawn-king structure stuff, essentially allowing for a better evaluation with virtually no additional computational cost. With classical evaluation falling out of favor and NNUE taking over, pawn-king hash tables aren't used in the top engines anymore (with the exception of the sf hybrid eval).

While a good NNUE network would most certainly "understand" pawn-king structure, I wonder if it would be possible to combine an NNUE with a pawn-king specific net. To combine both networks to get a single scalar evaluation, just concatenate the output of the pawn-king network to the output of one of the hidden layers in the NNUE. Then, use that as the input for the next layer. A pawn-king table can be reintroduced to cache the output of the pawn-king network.

I made a diagram of how such a network could potentially be structured. The regular NNUE input features are on the left and are incrementally updated like normal. In the diagram, I have the features as HalfKA (11x64x64=45056) but it could really be any feature set. The pawn-king network is on the right. The input is 4 planes of dimensions 8x8 for the pawns and kings of the side-to-move and the side-not-to-move. The diagram only shows one convolutional layer with 32 filters, but ideally, there could be many more layers and filters. Because the output of the pawn-king net is cached, I would imagine it could afford to be somewhat expensive.

Any thoughts on this concept? Does it hold any merit or is it completely impractical?

Sopel · Post by **Sopel** » Fri Apr 22, 2022 3:46 pm

For Stockfish all typical implementations of additional pawn-king features have failed (like done in ethereal and many variations). Your idea of using a shallow convnet is perhaps the only thing that might work. I was thinking about something similar, with a 3-4 deep conv subnet attaching to the FT output. So something like this:

It is on my radar to test.

DrCliche · Post by **DrCliche** » Sat Apr 23, 2022 10:45 pm

Being able to cache part of the evaluation function obviously seems good in the abstract. In terms of general theory, you're simply imposing a form of inductive bias, which can be effective when a network is small enough to become saturated, or if the loss surface of your training procedure isn't terribly well-behaved. There's probably good reason to believe both are true in a typical chess NNUE.

On the other hand, whether any particular feature of a chess position is very good or very bad can change based on the smallest possible perturbation of a single piece, or the loss or gain of single tempo, so it's also quite plausible that networks evaluating subsets of pieces in isolation (e.g. a Pawn-King network) miss the forest for the trees and aren't worth the trouble.

NNUE + Pawn-King Network

NNUE + Pawn-King Network

Re: NNUE + Pawn-King Network

Re: NNUE + Pawn-King Network