Devlog of Leorik

lithander · Post by **lithander** » Sat Dec 27, 2025 2:38 am

The other big improvement of version 3.2 over 3.1 comes from changes to the NNUE network architecture. After I showed that adding 1B positions sourced from FRC games didn't hurt standard performance of my net I also tried a few straightforward architectural improvements.

Horizontal Mirroring

First I added horizontal mirroring. When you flip a chess position horizontally the evaluation should be pretty similar as chess is symmetric in that regard. But a NNUE can't encode that symmetry and will see a position and its horizontally mirrored equivalent as distinct. So it has to learn the same patterns twice. With horizontal mirroring the position is flipped horizontally whenever the accumulator's king would be on the e-h files. When a king moves over the median the accumulator has to be computed from scratch which is expensive so nps goes down by 10% but the increased accuracy of the evaluation is worth it. +20 Elo!

Output Buckets

Next were output buckets. This is similar to the "tapering" that is usually done in PSQT. Tapered evaluation means that instead of using one set of PSTs for the whole game, the system uses separate tables for midgame and endgame. Based on the material on the board the final evaluation is an interpolation between the values from the two tables. Output buckets implement the same idea in a different way: the output buckets skip the interpolation and instead compute more distinct evaluations instead, the buckets. This isn't as expensive as it sounds: The bulk of the weights in a NNUE file are feature weights (InputSize * Layer1Size) and that stays unchanged. Only the OutputWeights (Layer1Size) and OutputBias need to be duplicated per bucket. Now when we do the big dot-product on the accumulator (with a bit of clamping aka ClippedReLU²) we have different sets of weights to pick from. And we chose them based on the number of pieces on the board. The formula is

Code: Select all

bucket = (pieceCount - 2) / DivCeil(32, #MaterialBuckets)

The computational overhead is minimal because whenever the accumulator changes we have to redo that computation anyway. And there's practically no scenario where a bucket changes (a piece leaves the board) and the accumulator doesn't.

I tried to be more sophisticated than just counting pieces. For example I trained a net where I mapped the phase calculation from PeSTO where pieces have different values (Pawn = 0, N & B = 1, R = 2, Q = 4) to buckets. But it didn't really make a difference in strength which is about ~10 Elo with a smallish 384 HL network. (to make sure I have enough data to saturate all buckets)

Input Buckets

Output buckets are not the only way to make the evaluation more context aware. Another application of buckets (that is also viable in PSQT evaluations) are king input buckets. The board is divided into regions, and the king's presence in a region (aka bucket) determines which set of accumulator weights are to be used. As in horizontal mirroring, a boundary-crossing king move makes a full recomputation of the accumulator necessary. That, and the fact that the NNUE file size is proportional to the number of king buckets means that you want only a handful of regions and there are plenty of viable layouts. Leorik uses:

Code: Select all

	0, 0, 1, 1, 1, 1, 0, 0,
	2, 2, 3, 3, 3, 3, 2, 2,
	2, 2, 3, 3, 3, 3, 2, 2,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
	4, 4, 4, 4, 4, 4, 4, 4,
        4, 4, 4, 4, 4, 4, 4, 4,

(from Black's POV)
The improvements in evaluation accuracy can be worth the slowdown if there's enough training data available.

Combined these new NNUE features made Leorik ~30 Elo stronger than version 3.1, both using a hidden layer of 640 weights.

Aleks Peshkov · Post by **Aleks Peshkov** » Sun Dec 28, 2025 5:33 am

Interesting. Seems NNUE improvements gain less Elo then training data quantity and quality.

ColonelPhantom · Post by **ColonelPhantom** » Tue Dec 30, 2025 1:23 am

Thanks for sharing! For the horizontal mirroring, instead of forcing the king to be left-side and recreating the accumulator when it moves, would there be value in maintaining two accumulators instead, similar to what's usually done for side-to-move? It would slow things down, probably significantly, but removes the extra cost for king moves and adds capacity to the network (more output parameters) rather than just taking away (ignoring half the kingpos parameters).

I'm also curious about castling in the context of horizontal mirroring; if the player to move switched their king's board half, the opposing player gets into an impossible position where they moved their king into the queen's position but are still able to castle. Are castling rights encoded as inputs as well?

lithander · Post by **lithander** » Wed Dec 31, 2025 3:07 am

ColonelPhantom wrote: ↑Tue Dec 30, 2025 1:23 am I'm also curious about castling in the context of horizontal mirroring; if the player to move switched their king's board half, the opposing player gets into an impossible position where they moved their king into the queen's position but are still able to castle.

Even in the most basic NNUE architecture (without buckets or horizontal mirroring) you would maintain two separate accumulators: one for black and one for white. And you have two sets of output weights e.g. Friend and Foe.
On the same position if the side-to-move is black you'd compute the eval = Dot(Black, Friend) + Dot(White, Foe)
and if it's whites turn then you'd compute the eval = Dot(White, Friend) + Dot(Black, Foe)

To make that work both accumulators have their friendly pieces on the same side. Just that 'friendly' means black pieces in one accu and the white pieces in the other. And only the friendly king is considered for mirroring.

ColonelPhantom wrote: ↑Tue Dec 30, 2025 1:23 am Are castling rights encoded as inputs as well?

No, usually you have 768 inputs. Which is 64 Squares * 6 Pieces * 2 Colors. You could probably add extra bits for castling rights and en-passent and see if that gains Elo but I never tried it.

ColonelPhantom wrote: ↑Tue Dec 30, 2025 1:23 am Thanks for sharing! For the horizontal mirroring, instead of forcing the king to be left-side and recreating the accumulator when it moves, would there be value in maintaining two accumulators instead, similar to what's usually done for side-to-move? It would slow things down, probably significantly, but removes the extra cost for king moves and adds capacity to the network (more output parameters) rather than just taking away (ignoring half the kingpos parameters).

If I remember correctly the top engines have an accumulator cache and when they can't cheaply update an accumulator because of for example a king moving into a different bucket they'd look up an accumulator state where the king is in the same bucket, compute the delta between cached position and the current position and then apply those adds/subs to the cached accumulator.

lithander · Post by **lithander** » Sun Jan 04, 2026 2:04 pm

I'm really close to wrapping up Leorik 3.2 and as I promised (myself), here's a summary of the new features and improvements I haven't yet talked about.

MultiPV

By default an engine only searches for the single best move in a position. To make Leorik more feature-complete and potentially viable for analysis I added the MultiPV UCI config option. When you set MultiPV=3 it forces the engine to display the top three principal variations. This allows users to see the evaluation gap between the best moves and find interesting alternative lines.
The feature consumes a lot of extra computation so when you aim for maximum strength it should be disabled. You lose about 100 Elo if you enable MultiPV=2, but in a depth limited search (when the search is no longer constrained by time or node count) it becomes a strength-enhancing feature and can add about 30 Elo: By considering alternatives more thoroughly you just get better search results overall.

Pondering

Another feature to make the engine more feature-complete and potentially stronger is pondering. When enabled it allows the engine to think on the opponent's time (just like a human would). The engine makes a prediction about which move the opponent will play (the ponder move is usually just the 2nd move in the PV) and while the opponent is thinking it already starts to compute the best reply under the assumption that the opponent plays the predicted move. If he does ("ponder hit") the engine has a massive head start! In my tests that gave +70 Elo. But most rating lists and tournaments have it disabled afaik because the thinking-on-opponents-time doesn't come for free but potentially slows down the opponent by putting load on the shared system.

Razoring (~11 Elo)

I have added Razoring with an adaptive margin based on statistics gathered during search. When a quiet move is best, we record how much it improved the static eval of the position. Razoring uses margin = mean + k * stddev (k = z for one-sided P98 ≈ 2.054) to target ~2% false negatives. Usually you would use one or two constants here and tune them but I don't like tunable constants and by trying to look for an alternative I found that this approach not only works but also produce a large variance in the margins the search used based on the character of the position. So maybe one constant to fit them all is really not the best idea. Maybe other engines would benefit from something similar as well. (I use the same idea of computing the margins dynamically based on statistics for reverse futility pruning.)

Dynamic reductions in Null Move Pruning (~12 Elo)

Null move pruning is one of the most powerful ways to thin out the search tree. Leorik's depth reduction used to be a constant 4 and is now based on a dynamic formula: R = 2 + 2 * (remaining / 4)
This patch performed well in my fast time control games but I feel like I should have maybe tested it for STC as well.

Time Control adjusts for Stability (~14 Elo)

I replaced the constant ABORT_RATIO = 0.5 in Leorik's time control code by a dynamically computed ratio in the range of [0.33..1] that is based on the Stability of the current iteration. Stability is a statistic I compute during search based on the proportion of nodes spent searching the best root move vs the total nodes searched. When we spent most of the time on the PV and alternatives are apparently easy to refute we are more likely to not miss anything important when aborting the search of the next ply.

Aleks Peshkov · Post by **Aleks Peshkov** » Sun Jan 04, 2026 3:56 pm

Dynamic eval margins seems to be novelty or old forgotten idea. Rich field for further exploration.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik