For example, see this model featured in the Stockfish nnue-pytorch documentation:
Code: Select all
class NNUE(nn.Module):
def __init__(self):
super(NNUE, self).__init__()
self.ft = nn.Linear(NUM_FEATURES, M)
self.l1 = nn.Linear(2 * M, N)
self.l2 = nn.Linear(N, K)
# The inputs are a whole batch!
# `stm` indicates whether white is the side to move. 1 = true, 0 = false.
def forward(self, white_features, black_features, stm):
w = self.ft(white_features) # white's perspective
b = self.ft(black_features) # black's perspective
# Remember that we order the accumulators for 2 perspectives based on who is to move.
# So we blend two possible orderings by interpolating between `stm` and `1-stm` tensors.
accumulator = (stm * torch.cat([w, b], dim=1)) + ((1 - stm) * torch.cat([b, w], dim=1))
# Run the linear layers and use clamp_ as ClippedReLU
l1_x = torch.clamp(accumulator, 0.0, 1.0)
l2_x = torch.clamp(self.l1(l1_x), 0.0, 1.0)
return self.l2(l2_x)
But it seems like the accumulator in this forward() method is doing something extra here. Why, for example, can't we just pass in the features of the side to move - providing that the features are already flipped so as to be side-to-move relative?
I have the same problem when trying to understand some examples of inference code that I've seen, where the output is taken by summing crelu(us_features) + crelu(them_features). How does this differ from simply using us_features?
Any help would be much appreciated!