Horizontal Mirroring
First I added horizontal mirroring. When you flip a chess position horizontally the evaluation should be pretty similar as chess is symmetric in that regard. But a NNUE can't encode that symmetry and will see a position and its horizontally mirrored equivalent as distinct. So it has to learn the same patterns twice. With horizontal mirroring the position is flipped horizontally whenever the accumulator's king would be on the e-h files. When a king moves over the median the accumulator has to be computed from scratch which is expensive so nps goes down by 10% but the increased accuracy of the evaluation is worth it. +20 Elo!
Output Buckets
Next were output buckets. This is similar to the "tapering" that is usually done in PSQT. Tapered evaluation means that instead of using one set of PSTs for the whole game, the system uses separate tables for midgame and endgame. Based on the material on the board the final evaluation is an interpolation between the values from the two tables. Output buckets implement the same idea in a different way: the output buckets skip the interpolation and instead compute more distinct evaluations instead, the buckets. This isn't as expensive as it sounds: The bulk of the weights in a NNUE file are feature weights (InputSize * Layer1Size) and that stays unchanged. Only the OutputWeights (Layer1Size) and OutputBias need to be duplicated per bucket. Now when we do the big dot-product on the accumulator (with a bit of clamping aka ClippedReLU²) we have different sets of weights to pick from. And we chose them based on the number of pieces on the board. The formula is
Code: Select all
bucket = (pieceCount - 2) / DivCeil(32, #MaterialBuckets)I tried to be more sophisticated than just counting pieces. For example I trained a net where I mapped the phase calculation from PeSTO where pieces have different values (Pawn = 0, N & B = 1, R = 2, Q = 4) to buckets. But it didn't really make a difference in strength which is about ~10 Elo with a smallish 384 HL network. (to make sure I have enough data to saturate all buckets)
Input Buckets
Output buckets are not the only way to make the evaluation more context aware. Another application of buckets (that is also viable in PSQT evaluations) are king input buckets. The board is divided into regions, and the king's presence in a region (aka bucket) determines which set of accumulator weights are to be used. As in horizontal mirroring, a boundary-crossing king move makes a full recomputation of the accumulator necessary. That, and the fact that the NNUE file size is proportional to the number of king buckets means that you want only a handful of regions and there are plenty of viable layouts. Leorik uses:
Code: Select all
0, 0, 1, 1, 1, 1, 0, 0,
2, 2, 3, 3, 3, 3, 2, 2,
2, 2, 3, 3, 3, 3, 2, 2,
4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4,
The improvements in evaluation accuracy can be worth the slowdown if there's enough training data available.
Combined these new NNUE features made Leorik ~30 Elo stronger than version 3.1, both using a hidden layer of 640 weights.
