How to understand 'perspective' in NNUE nets?

kelseyde123 · Post by **kelseyde123** » Mon Jul 08, 2024 9:08 am

I'm trying to write an NNUE trainer in Pytorch, and I think I'm failing to properly understand the concept of 'perspective' which is present in many networks.

For example, see this model featured in the Stockfish nnue-pytorch documentation:

Code: Select all

class NNUE(nn.Module):
    def __init__(self):
        super(NNUE, self).__init__()

        self.ft = nn.Linear(NUM_FEATURES, M)
        self.l1 = nn.Linear(2 * M, N)
        self.l2 = nn.Linear(N, K)

    # The inputs are a whole batch!
    # `stm` indicates whether white is the side to move. 1 = true, 0 = false.
    def forward(self, white_features, black_features, stm):
        w = self.ft(white_features) # white's perspective
        b = self.ft(black_features) # black's perspective

        # Remember that we order the accumulators for 2 perspectives based on who is to move.
        # So we blend two possible orderings by interpolating between `stm` and `1-stm` tensors.
        accumulator = (stm * torch.cat([w, b], dim=1)) + ((1 - stm) * torch.cat([b, w], dim=1))

        # Run the linear layers and use clamp_ as ClippedReLU
        l1_x = torch.clamp(accumulator, 0.0, 1.0)
        l2_x = torch.clamp(self.l1(l1_x), 0.0, 1.0)
        return self.l2(l2_x)

I understand that the eval must always be relative to the side-to-move, and that the network must output the same eval for symmetrical positions with the colours flipped.

But it seems like the accumulator in this forward() method is doing something extra here. Why, for example, can't we just pass in the features of the side to move - providing that the features are already flipped so as to be side-to-move relative?

I have the same problem when trying to understand some examples of inference code that I've seen, where the output is taken by summing crelu(us_features) + crelu(them_features). How does this differ from simply using us_features?

Any help would be much appreciated!

alvinypeng · Post by **alvinypeng** » Mon Jul 08, 2024 4:45 pm

There are a few reasons for using both white features and black features. One reason is the the white features are not just the black features but mirrored. The white features use the white king's position to compute feature indices, whereas black features use the black king's position to compute feature indices. If you only use one side, you would lose some information on where the king for one side is.

kelseyde123 · Post by **kelseyde123** » Mon Jul 08, 2024 7:01 pm

Ah, that makes some sense, thanks.

I suppose one reason I was confused is that I'm not implementing a HalfKA/HalfKP architecture where the king position plays a role. Instead I've gone for a 768-input approach (64 squares * 6 pieces * 2 colours), with a hidden layer of 256.

However, other engines who use this architecture also seem to pass in both the white and black features, and maintain a separate set of output weights for both feature sets.

I get that maintaining the accumulator from the white and black side is essential to make it 'efficiently updatable', since when the side-to-move switches you don't want to re-calculate the hidden layer from scratch.

But if king-position doesn't matter, then the black_features really are just a flipped version of the white_features.

Therefore, what other reasons would there be for using both sets of features during inference, or indeed during training as in this example from Stockfish's Pytorch trainer?

JacquesRW · Post by **JacquesRW** » Tue Jul 09, 2024 1:19 am

kelseyde123 wrote: ↑Mon Jul 08, 2024 7:01 pm But if king-position doesn't matter, then the black_features really are just a flipped version of the white_features.

This does not imply that the accumulator resulting from the nstm features does not contain distinctly valuable information. You have both colour accumulators available, why not (try to) use them both?

A more practical reason is that it gains a significant amount of elo, to the point where everyone just goes straight to perspective now, because why bother with a non-perspective network. I recommend you test youself. Usually I'd link a few exact tests but its a bit ropey here because this was established ages ago and I can't be bothered to search back that far (back when we did not have all our testing conveniently available on OB, so it'd take a while).

https://github.com/PGG106/Alexandria/pull/148 - bad example really, but the original net that replaced it when the author re-acquired the hardware to train their own nets again was stronger.

JacquesRW · Post by **JacquesRW** » Tue Jul 09, 2024 1:29 pm

I trained two nets, (768 -> 256)x2 -> 1 and 768 -> 256 -> 1 identically, the only difference being the single/dual perspective:

Code: Select all

Elo   | 89.48 +- 15.04 (95%)
Conf  | 8.0+0.08s Threads=1 Hash=32MB
Games | N: 1004 W: 435 L: 182 D: 387
Penta | [7, 59, 178, 190, 68]
https://chess.swehosting.se/test/7357/

To be honest, I wasn't expecting the difference to be quite that large.

kelseyde123 · Post by **kelseyde123** » Tue Jul 09, 2024 3:38 pm

Hi Jacques,

That's really interesting! I'm still struggling to grasp what extra information the flipped perspective gives, to account for that strength gain. I saw in another discussion someone mention that having both perspectives allows the net to 'learn tempo', so maybe it's something to do with that.

Thanks for running the test. Once I fix all the bugs in my NNUE trainer, I'll start doing some tests of my own.

JacquesRW · Post by **JacquesRW** » Tue Jul 09, 2024 6:55 pm

I recommend joining the following discord servers:
Engine Programming: https://discord.com/invite/F6W6mMsTGN
Stockfish: https://discord.com/invite/GWDRS3kU6R
They are massively more active than this forum and provide much more up to date information than here or chess programming wiki.

In particular you'll get the rundown on testing, and avoids situations like https://github.com/kelseyde/calvin-ches ... 08311d6612 in the future (note that neither the initial commit nor the revert had sufficient evidence of elo gain/loss).

kelseyde123 · Post by **kelseyde123** » Thu Jul 11, 2024 12:07 am

Thanks for the tip, I'll check out the Discord!

How to understand 'perspective' in NNUE nets?

How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?

Re: How to understand 'perspective' in NNUE nets?