Devlog of Leorik

lithander · Post by **lithander** » Sat Dec 27, 2025 2:38 am

The other big improvement of version 3.2 over 3.1 comes from changes to the NNUE network architecture. After I showed that adding 1B positions sourced from FRC games didn't hurt standard performance of my net I also tried a few straightforward architectural improvements.

Horizontal Mirroring

First I added horizontal mirroring. When you flip a chess position horizontally the evaluation should be pretty similar as chess is symmetric in that regard. But a NNUE can't encode that symmetry and will see a position and its horizontally mirrored equivalent as distinct. So it has to learn the same patterns twice. With horizontal mirroring the position is flipped horizontally whenever the accumulator's king would be on the e-h files. When a king moves over the median the accumulator has to be computed from scratch which is expensive so nps goes down by 10% but the increased accuracy of the evaluation is worth it. +20 Elo!

Output Buckets

Next were output buckets. This is similar to the "tapering" that is usually done in PSQT. Tapered evaluation means that instead of using one set of PSTs for the whole game, the system uses separate tables for midgame and endgame. Based on the material on the board the final evaluation is an interpolation between the values from the two tables. Output buckets implement the same idea in a different way: the output buckets skip the interpolation and instead compute more distinct evaluations instead, the buckets. This isn't as expensive as it sounds: The bulk of the weights in a NNUE file are feature weights (InputSize * Layer1Size) and that stays unchanged. Only the OutputWeights (Layer1Size) and OutputBias need to be duplicated per bucket. Now when we do the big dot-product on the accumulator (with a bit of clamping aka ClippedReLU²) we have different sets of weights to pick from. And we chose them based on the number of pieces on the board. The formula is

Code: Select all

bucket = (pieceCount - 2) / DivCeil(32, #MaterialBuckets)

The computational overhead is minimal because whenever the accumulator changes we have to redo that computation anyway. And there's practically no scenario where a bucket changes (a piece leaves the board) and the accumulator doesn't.

I tried to be more sophisticated than just counting pieces. For example I trained a net where I mapped the phase calculation from PeSTO where pieces have different values (Pawn = 0, N & B = 1, R = 2, Q = 4) to buckets. But it didn't really make a difference in strength which is about ~10 Elo with a smallish 384 HL network. (to make sure I have enough data to saturate all buckets)

Input Buckets

Output buckets are not the only way to make the evaluation more context aware. Another application of buckets (that is also viable in PSQT evaluations) are king input buckets. The board is divided into regions, and the king's presence in a region (aka bucket) determines which set of accumulator weights are to be used. As in horizontal mirroring, a boundary-crossing king move makes a full recomputation of the accumulator necessary. That, and the fact that the NNUE file size is proportional to the number of king buckets means that you want only a handful of regions and there are plenty of viable layouts. Leorik uses:

Code: Select all

	0, 0, 1, 1, 1, 1, 0, 0,
	2, 2, 3, 3, 3, 3, 2, 2,
	2, 2, 3, 3, 3, 3, 2, 2,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
        4, 4, 4, 4, 4, 4, 4, 4,

(from Black's POV)
The improvements in evaluation accuracy can be worth the slowdown if there's enough training data available.

Combined these new NNUE features made Leorik ~30 Elo stronger than version 3.1, both using a hidden layer of 640 weights.

Aleks Peshkov · Post by **Aleks Peshkov** » Sun Dec 28, 2025 5:33 am

Interesting. Seems NNUE improvements gain less Elo then training data quantity and quality.

ColonelPhantom · Post by **ColonelPhantom** » Tue Dec 30, 2025 1:23 am

Thanks for sharing! For the horizontal mirroring, instead of forcing the king to be left-side and recreating the accumulator when it moves, would there be value in maintaining two accumulators instead, similar to what's usually done for side-to-move? It would slow things down, probably significantly, but removes the extra cost for king moves and adds capacity to the network (more output parameters) rather than just taking away (ignoring half the kingpos parameters).

I'm also curious about castling in the context of horizontal mirroring; if the player to move switched their king's board half, the opposing player gets into an impossible position where they moved their king into the queen's position but are still able to castle. Are castling rights encoded as inputs as well?

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik