How to understand 'perspective' in NNUE nets?

Discussion of chess software programming and technical issues.

Moderator: Ras

kelseyde123
Posts: 19
Joined: Fri Oct 06, 2023 1:10 am
Full name: Dan Kelsey

How to understand 'perspective' in NNUE nets?

Post by kelseyde123 »

I'm trying to write an NNUE trainer in Pytorch, and I think I'm failing to properly understand the concept of 'perspective' which is present in many networks.

For example, see this model featured in the Stockfish nnue-pytorch documentation:

Code: Select all

class NNUE(nn.Module):
    def __init__(self):
        super(NNUE, self).__init__()

        self.ft = nn.Linear(NUM_FEATURES, M)
        self.l1 = nn.Linear(2 * M, N)
        self.l2 = nn.Linear(N, K)

    # The inputs are a whole batch!
    # `stm` indicates whether white is the side to move. 1 = true, 0 = false.
    def forward(self, white_features, black_features, stm):
        w = self.ft(white_features) # white's perspective
        b = self.ft(black_features) # black's perspective

        # Remember that we order the accumulators for 2 perspectives based on who is to move.
        # So we blend two possible orderings by interpolating between `stm` and `1-stm` tensors.
        accumulator = (stm * torch.cat([w, b], dim=1)) + ((1 - stm) * torch.cat([b, w], dim=1))

        # Run the linear layers and use clamp_ as ClippedReLU
        l1_x = torch.clamp(accumulator, 0.0, 1.0)
        l2_x = torch.clamp(self.l1(l1_x), 0.0, 1.0)
        return self.l2(l2_x)
I understand that the eval must always be relative to the side-to-move, and that the network must output the same eval for symmetrical positions with the colours flipped.

But it seems like the accumulator in this forward() method is doing something extra here. Why, for example, can't we just pass in the features of the side to move - providing that the features are already flipped so as to be side-to-move relative?

I have the same problem when trying to understand some examples of inference code that I've seen, where the output is taken by summing crelu(us_features) + crelu(them_features). How does this differ from simply using us_features?

Any help would be much appreciated!
alvinypeng
Posts: 36
Joined: Thu Mar 03, 2022 7:29 am
Full name: Alvin Peng

Re: How to understand 'perspective' in NNUE nets?

Post by alvinypeng »

There are a few reasons for using both white features and black features. One reason is the the white features are not just the black features but mirrored. The white features use the white king's position to compute feature indices, whereas black features use the black king's position to compute feature indices. If you only use one side, you would lose some information on where the king for one side is.
kelseyde123
Posts: 19
Joined: Fri Oct 06, 2023 1:10 am
Full name: Dan Kelsey

Re: How to understand 'perspective' in NNUE nets?

Post by kelseyde123 »

Ah, that makes some sense, thanks.

I suppose one reason I was confused is that I'm not implementing a HalfKA/HalfKP architecture where the king position plays a role. Instead I've gone for a 768-input approach (64 squares * 6 pieces * 2 colours), with a hidden layer of 256.

However, other engines who use this architecture also seem to pass in both the white and black features, and maintain a separate set of output weights for both feature sets.

I get that maintaining the accumulator from the white and black side is essential to make it 'efficiently updatable', since when the side-to-move switches you don't want to re-calculate the hidden layer from scratch.

But if king-position doesn't matter, then the black_features really are just a flipped version of the white_features.

Therefore, what other reasons would there be for using both sets of features during inference, or indeed during training as in this example from Stockfish's Pytorch trainer?
JacquesRW
Posts: 119
Joined: Sat Jul 30, 2022 12:12 pm
Full name: Jamie Whiting

Re: How to understand 'perspective' in NNUE nets?

Post by JacquesRW »

kelseyde123 wrote: Mon Jul 08, 2024 7:01 pm But if king-position doesn't matter, then the black_features really are just a flipped version of the white_features.
This does not imply that the accumulator resulting from the nstm features does not contain distinctly valuable information. You have both colour accumulators available, why not (try to) use them both?

A more practical reason is that it gains a significant amount of elo, to the point where everyone just goes straight to perspective now, because why bother with a non-perspective network. I recommend you test youself. Usually I'd link a few exact tests but its a bit ropey here because this was established ages ago and I can't be bothered to search back that far (back when we did not have all our testing conveniently available on OB, so it'd take a while).

https://github.com/PGG106/Alexandria/pull/148 - bad example really, but the original net that replaced it when the author re-acquired the hardware to train their own nets again was stronger.
JacquesRW
Posts: 119
Joined: Sat Jul 30, 2022 12:12 pm
Full name: Jamie Whiting

Re: How to understand 'perspective' in NNUE nets?

Post by JacquesRW »

I trained two nets, (768 -> 256)x2 -> 1 and 768 -> 256 -> 1 identically, the only difference being the single/dual perspective:

Code: Select all

Elo   | 89.48 +- 15.04 (95%)
Conf  | 8.0+0.08s Threads=1 Hash=32MB
Games | N: 1004 W: 435 L: 182 D: 387
Penta | [7, 59, 178, 190, 68]
https://chess.swehosting.se/test/7357/
To be honest, I wasn't expecting the difference to be quite that large.
kelseyde123
Posts: 19
Joined: Fri Oct 06, 2023 1:10 am
Full name: Dan Kelsey

Re: How to understand 'perspective' in NNUE nets?

Post by kelseyde123 »

Hi Jacques,

That's really interesting! I'm still struggling to grasp what extra information the flipped perspective gives, to account for that strength gain. I saw in another discussion someone mention that having both perspectives allows the net to 'learn tempo', so maybe it's something to do with that.

Thanks for running the test. Once I fix all the bugs in my NNUE trainer, I'll start doing some tests of my own. :D
JacquesRW
Posts: 119
Joined: Sat Jul 30, 2022 12:12 pm
Full name: Jamie Whiting

Re: How to understand 'perspective' in NNUE nets?

Post by JacquesRW »

I recommend joining the following discord servers:
Engine Programming: https://discord.com/invite/F6W6mMsTGN
Stockfish: https://discord.com/invite/GWDRS3kU6R
They are massively more active than this forum and provide much more up to date information than here or chess programming wiki.

In particular you'll get the rundown on testing, and avoids situations like https://github.com/kelseyde/calvin-ches ... 08311d6612 in the future (note that neither the initial commit nor the revert had sufficient evidence of elo gain/loss).
kelseyde123
Posts: 19
Joined: Fri Oct 06, 2023 1:10 am
Full name: Dan Kelsey

Re: How to understand 'perspective' in NNUE nets?

Post by kelseyde123 »

Thanks for the tip, I'll check out the Discord!