Devlog of Leorik

lithander · Post by **lithander** » Sat Dec 27, 2025 2:38 am

The other big improvement of version 3.2 over 3.1 comes from changes to the NNUE network architecture. After I showed that adding 1B positions sourced from FRC games didn't hurt standard performance of my net I also tried a few straightforward architectural improvements.

Horizontal Mirroring

First I added horizontal mirroring. When you flip a chess position horizontally the evaluation should be pretty similar as chess is symmetric in that regard. But a NNUE can't encode that symmetry and will see a position and its horizontally mirrored equivalent as distinct. So it has to learn the same patterns twice. With horizontal mirroring the position is flipped horizontally whenever the accumulator's king would be on the e-h files. When a king moves over the median the accumulator has to be computed from scratch which is expensive so nps goes down by 10% but the increased accuracy of the evaluation is worth it. +20 Elo!

Output Buckets

Next were output buckets. This is similar to the "tapering" that is usually done in PSQT. Tapered evaluation means that instead of using one set of PSTs for the whole game, the system uses separate tables for midgame and endgame. Based on the material on the board the final evaluation is an interpolation between the values from the two tables. Output buckets implement the same idea in a different way: the output buckets skip the interpolation and instead compute more distinct evaluations instead, the buckets. This isn't as expensive as it sounds: The bulk of the weights in a NNUE file are feature weights (InputSize * Layer1Size) and that stays unchanged. Only the OutputWeights (Layer1Size) and OutputBias need to be duplicated per bucket. Now when we do the big dot-product on the accumulator (with a bit of clamping aka ClippedReLU²) we have different sets of weights to pick from. And we chose them based on the number of pieces on the board. The formula is

Code: Select all

bucket = (pieceCount - 2) / DivCeil(32, #MaterialBuckets)

The computational overhead is minimal because whenever the accumulator changes we have to redo that computation anyway. And there's practically no scenario where a bucket changes (a piece leaves the board) and the accumulator doesn't.

I tried to be more sophisticated than just counting pieces. For example I trained a net where I mapped the phase calculation from PeSTO where pieces have different values (Pawn = 0, N & B = 1, R = 2, Q = 4) to buckets. But it didn't really make a difference in strength which is about ~10 Elo with a smallish 384 HL network. (to make sure I have enough data to saturate all buckets)

Input Buckets

Output buckets are not the only way to make the evaluation more context aware. Another application of buckets (that is also viable in PSQT evaluations) are king input buckets. The board is divided into regions, and the king's presence in a region (aka bucket) determines which set of accumulator weights are to be used. As in horizontal mirroring, a boundary-crossing king move makes a full recomputation of the accumulator necessary. That, and the fact that the NNUE file size is proportional to the number of king buckets means that you want only a handful of regions and there are plenty of viable layouts. Leorik uses:

Code: Select all

	0, 0, 1, 1, 1, 1, 0, 0,
	2, 2, 3, 3, 3, 3, 2, 2,
	2, 2, 3, 3, 3, 3, 2, 2,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
        4, 4, 4, 4, 4, 4, 4, 4,

(from Black's POV)
The improvements in evaluation accuracy can be worth the slowdown if there's enough training data available.

Combined these new NNUE features made Leorik ~30 Elo stronger than version 3.1, both using a hidden layer of 640 weights.

Devlog of Leorik

Re: Devlog of Leorik