Devlog of Leorik

lithander · Post by **lithander** » Mon Mar 31, 2025 11:17 am

op12no2 wrote: ↑Mon Mar 31, 2025 9:13 am Nice. Have you tried SqrRelu? I find it better than Screlu.

The S in SCRelu is "squared" so you do you mean squared but not clipped?

op12no2 · Post by **op12no2** » Tue Apr 01, 2025 12:35 pm

Yeah, sorry I was using the bullet naming convention; squared but not clipped.

https://github.com/op12no2/lozza/blob/c ... a.js#L1994

lithander · Post by **lithander** » Tue Apr 01, 2025 2:38 pm

When I tried Squared-Clipped-Relu vs Clipped-Relu I couldn't get it to gain at first because I didn't know how to implement it without losing too much speed. Only when I figured out how I could continue to use _mm256_madd_epi16 instead of widening to int too early the SCRelu managed to beat CRelu in practice.

So I'm using quantization which means instead of using floats (32bit) values are represented with integer values. Everything is multiplied with a quantization factor (e.g. 255) and then rounded to nearest int. So if you have float values in the range [0..1] and chose a quantization factor 255 now all these values neatly fit into 8bits. (with a loss of precision ofc)

When doing NNUE inference normally you'd compute the activation function and then multiply the result with weight. For SCRelu you do this:

Code: Select all

f(x) = clamp(x, 0, 1)^2 * weight

With quantization however it's more efficient to do it like this:

Code: Select all

a = clamp(x, 0, 1)
f(x) = (a * weight) * a

And this is only because the clamped a is known to be in the range of [0..255] and if you quantize the weights to be in the range of [-127..127] then voila the result doesn't overflow a short and you can use _mm256_madd_epi16 for twice the throughput than what you'd get if you'd really square the clipped activation.

...in other words. This is complicated stuff not because the math is hard but you also need to run these billions of multiply & add operations as fast as possible or the small precision gain from a slightly superior activation function isn't worth the speedloss.

So, in my case I think SqrRelu won't help me improve because I can't see a way to implement it as fast as quantized SCRelu currently is implemented. I need the clipping & and quantization to squeeze everything into shorts!

op12no2 · Post by **op12no2** » Fri Apr 04, 2025 8:06 am

I only have a single perspective layer and use an int32 for acculumating the eval itself and Javascript has no native SIMD facilities so my life is a lot easier in this respect

I think Stormphrax has SqrRelu in a recent net - presumably on the final layer based on what you have said (?).

lithander · Post by **lithander** » Sun Apr 06, 2025 7:42 pm

I just uploaded version 3.1 binaries to github: https://github.com/lithander/Leorik/releases/tag/3.1

Version 3.1 improves Leorik's NNUE evaluation by using a larger network (640 HL neurons) and adopting SCReLU activation. The network was trained from scratch over 19 generations. The network 640HL-S-5288M-Tmix-Q5-v19 that releases with version 3.1, was trained on 5.2B positions from generation 13 to 19. Search improvements include the addition of Correction History, increased reduction of late quiet moves, a dynamic threshold for identifying such moves, and the introduction of RFP with dynamic margins derived from NMP statistics.

I haven't run matches against 3.0 or other engines yet so I'm pretty curious how big the strength increase is. Will post later about that.

lithander · Post by **lithander** » Sun Apr 06, 2025 10:31 pm

Code: Select all

/cutechess-cli.exe -engine conf="Leorik-3.1" -engine conf="Leorik-3.0.1" 
 -each tc=5+0.1 -openings file="UHO_2024_8mvs_big_+095_+114.pgn" 
 -concurrency 11

Score of Leorik-3.1 vs Leorik-3.0.1: 2874 - 483 - 1643  [0.739] 5000
...      Leorik-3.1 playing White: 1988 - 50 - 462  [0.888] 2500
...      Leorik-3.1 playing Black: 886 - 433 - 1181  [0.591] 2500
...      White vs Black: 2421 - 936 - 1643  [0.648] 5000
Elo difference: 180.9 +/- 8.3, LOS: 100.0 %, DrawRatio: 32.9 %

+180 Elo selfplay, fast tc, unbalanced openings

lithander · Post by **lithander** » Sat Apr 12, 2025 11:22 pm

The preliminary results of testing of Leorik 3.1 for the CEGT 40/4 list confirm my previous estimate of +180 Elo, now against a wider range of opponents!

Meanwhile I have updated the Leorik binaries released on github to version 3.1.2 of similar strength but where the internal eval is scaled by a factor before getting printed in the uci output. Outputting the raw eval values makes the engine look very dramatic as scores are about 3x bigger than what you'd expect. That's because training a net is all about minimizing the prediction error and not at all aware of what a centipawn is.

Only after release of 3.1 I learned that most engines scale the internal eval before printing and also how to derive a good scaling factor, namely with the tool: https://github.com/official-stockfish/WDL_model

You can give it a bunch of pgn files containing games that your engine has played and calculate the stats for your engine like this:

Code: Select all

./scoreWDLstat.exe --matchEngine "Leorik-3.1"

Then run

Code: Select all

python .\scoreWDL.py --NormalizeToPawnValue 100

100 is only correct, if you had no scaling before! (otherwise set according to your scaling factor)

...and within seconds you're presented with a scientific looking visualization

...and in the log you'll find a line like const int NormalizeToPawnValue = 306; and while I think it's a bit confusingly named it's the score at which the propability of a win/loss is 50% which is (according to stockfish) when you should evaluate the position as +/- 100 cp!

lithander · Post by **lithander** » Sat Apr 19, 2025 3:31 pm

I have trained a number of networks all compatible with Leorik 3.1 binaries. You can download them from github here

Just make sure to place the nnue file you want besides the Leorik binary. Start the binary and confirm from the console output that the nnue file you wanted was loaded correctly!

I have included a graph showing how more neurons improves the accuracy of the networks (y-axis, less loss is better) but also requires a longer training duration (x-axis, more epochs takes longer) and finally I have annotated the graphs with the relative Elo gain of doubling the HL size.

lithander · Post by **lithander** » Sun Dec 07, 2025 1:33 am

Over the last couple of months I've worked on Leorik only sparingly. But as I see Leorik 3.1 struggle in Division 5 I feel like maybe it's time to release a new version, soon.

During most of Leorik's development I avoided looking at other engines source code and so I made a few choices that are probably not great if you want to compete with the strongest engines: I use fail-hard search, don't have a monolithic move-picker but play moves in a more rigid sequence (due to my staged movegen), all my reductions are multiples of two, there are barely any tuneable constants and I don't really have the hardware or patience for SPRT testing micro-optimizations... When I started to train nets from scratch Leorik initially had no chess knowledge beyond what was implemented in code: the basic rules and its search implementation. All chess knowledge the NNUE net now encodes are a reflection of how Leorik searches and are shaped by it's biasses, strengths and weaknesses. I feel like that gives Leorik a unique identity that I want to preserve. (One could argue the obsession with Elo kills engine diversity.)

So I was focusing mostly on adding features that make the engine more versatile or convenient to use. I added support for Chess960 (FRC & DFRC) and MultiPV and Pondering and a few non-UCI debug commands like fen, eval, perft, flip and moves. I have done a few refactorings and optimizations to existing features, some of them just simplify the code others should give a few Elo. I have also improved the time management by considering the stability of the current PV. And I have improved the NNUE architecture: it's now horizontally mirrored and has input and output buckets.

I think individual features like Chess960 support and the changed NNUE architecture may deserve their own posts. I'll try to write them before I release Leorik 3.2 in the upcoming days (or weeks)!

lithander · Post by **lithander** » Wed Dec 17, 2025 1:55 am

The first thing I worked on after releasing version 3.1 was adding support for Fischer Random Chess aka Chess960 aka Freestyle Chess.

The only tricky part was castling. In standard chess there are only 4 castling moves: e8g8, e8c8, e1g1 and e1c1 and my move generator was hardcoded to try and emit each of these moves individually. If castling of a move was generally available (encoded in FEN by the "KQkq" part) then for each move there's a small set of squares that must be empty and not attacked. Straightforward, because there are just 4 possible moves to consider.

In FRC there are a lot more flavors of castling moves, including ones where the King doesn't actually move at all. And even if the king does move, encoding it as a normal king move could be ambiguous in cases where it's just one square: is it a regular move or meant to castle? For that reason the Chess960 variant encodes castling as King-takes-rook. The move generator has to check if...
1) all squares between the castling King's initial and final square (inclusive), and all of the squares between the castling Rook's initial and final square (inclusive) are vacant except for the King and involved Rook and...
2) if no square through which the King moves including starting and final square is under enemy attack.

To verify the refactored move generator I ran perft tests on a set of positions with known results: fisher.epd

Looking at these positions, it becomes clear that they are not standard FENs but X-FEN, where castling rights are given by KQkq only when they are related to the outermost rook of the affected side. If instead an inner rook is associated with that right, the traditional castling tag will be replaced by the file letter of the involved rook, using upper case for White. For example 'C' in this position. Adding support for these encodings was the second challenge.

The third challenge was to refactor the NNUE "efficiently update" mechanism that activates and deactivates features on each move. Previously a castling move was encoded as king move to the target square, which naturally deactivates and activates the correct king features. When the move carried a castling flag I used to deactivate and activate the rook as well.
With the new encoding king-takes-rook it works a little different: Any capture move will always deactivate the moving piece, deactivate the captured piece and activate a "new" piece. Usually the new pieces is simply the moving piece on the new square. But for castling the move's new piece is None. After running the normal code the rook and king features got deactivated and nothing activated. Even in FRC *after* the castling move the king and rook are always in their traditional position. So if a castling flag is present we just have to activate the king and rook on their respective squares. commit shows it's less complicated than it sounds.

Without any changes to the NNUE weights Leorik was already able to play FRC against other engines. My first opponent was Dumb 2.3 which was completely crushed.

Now I started to generate selfplay data from FRC and DFRC matches. I used opening books (DFRC_openings.epd and 3moves_FRC.epd) instead of playing the first X moves randomly from the standard starting position. Adding just about 500M new labeled positions improved the FRC performance by 110 Elo! Eventually I trained a net from 6.1B total positions, including 1B FRC and DFRC positions and it showed improved performance in standard chess as well. So luckily there's no apparent necessity to have separate weights for FRC and normal chess!

Once I release Leorik 3.2 I hope it will be tested by the people maintaining the CCRL 40/2 FRC list! I'm really curious how Leorik will place.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

New Release of version 3.1

New Release of version 3.1

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik