Devlog of Leorik

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

op12no2 wrote: Mon Mar 31, 2025 9:13 am Nice. Have you tried SqrRelu? I find it better than Screlu.
The S in SCRelu is "squared" so you do you mean squared but not clipped?
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
op12no2
Posts: 540
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Devlog of Leorik

Post by op12no2 »

Yeah, sorry I was using the bullet naming convention; squared but not clipped.

https://github.com/op12no2/lozza/blob/c ... a.js#L1994
User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

When I tried Squared-Clipped-Relu vs Clipped-Relu I couldn't get it to gain at first because I didn't know how to implement it without losing too much speed. Only when I figured out how I could continue to use _mm256_madd_epi16 instead of widening to int too early the SCRelu managed to beat CRelu in practice.

So I'm using quantization which means instead of using floats (32bit) values are represented with integer values. Everything is multiplied with a quantization factor (e.g. 255) and then rounded to nearest int. So if you have float values in the range [0..1] and chose a quantization factor 255 now all these values neatly fit into 8bits. (with a loss of precision ofc)

When doing NNUE inference normally you'd compute the activation function and then multiply the result with weight. For SCRelu you do this:

Code: Select all

f(x) = clamp(x, 0, 1)^2 * weight

With quantization however it's more efficient to do it like this:

Code: Select all

a = clamp(x, 0, 1)
f(x) = (a * weight) * a

And this is only because the clamped a is known to be in the range of [0..255] and if you quantize the weights to be in the range of [-127..127] then voila the result doesn't overflow a short and you can use _mm256_madd_epi16 for twice the throughput than what you'd get if you'd really square the clipped activation. :shock: :idea:

...in other words. This is complicated stuff not because the math is hard but you also need to run these billions of multiply & add operations as fast as possible or the small precision gain from a slightly superior activation function isn't worth the speedloss.

So, in my case I think SqrRelu won't help me improve because I can't see a way to implement it as fast as quantized SCRelu currently is implemented. I need the clipping & and quantization to squeeze everything into shorts!
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
op12no2
Posts: 540
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Devlog of Leorik

Post by op12no2 »

I only have a single perspective layer and use an int32 for acculumating the eval itself and Javascript has no native SIMD facilities so my life is a lot easier in this respect :) I think Stormphrax has SqrRelu in a recent net - presumably on the final layer based on what you have said (?).
User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

*New* Release of version 3.1

Post by lithander »

I just uploaded version 3.1 binaries to github: https://github.com/lithander/Leorik/releases/tag/3.1

Version 3.1 improves Leorik's NNUE evaluation by using a larger network (640 HL neurons) and adopting SCReLU activation. The network was trained from scratch over 19 generations. The network 640HL-S-5288M-Tmix-Q5-v19 that releases with version 3.1, was trained on 5.2B positions from generation 13 to 19. Search improvements include the addition of Correction History, increased reduction of late quiet moves, a dynamic threshold for identifying such moves, and the introduction of RFP with dynamic margins derived from NMP statistics.

I haven't run matches against 3.0 or other engines yet so I'm pretty curious how big the strength increase is. Will post later about that.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

*New* Release of version 3.1

Post by lithander »

Code: Select all

/cutechess-cli.exe -engine conf="Leorik-3.1" -engine conf="Leorik-3.0.1" 
 -each tc=5+0.1 -openings file="UHO_2024_8mvs_big_+095_+114.pgn" 
 -concurrency 11

Score of Leorik-3.1 vs Leorik-3.0.1: 2874 - 483 - 1643  [0.739] 5000
...      Leorik-3.1 playing White: 1988 - 50 - 462  [0.888] 2500
...      Leorik-3.1 playing Black: 886 - 433 - 1181  [0.591] 2500
...      White vs Black: 2421 - 936 - 1643  [0.648] 5000
Elo difference: 180.9 +/- 8.3, LOS: 100.0 %, DrawRatio: 32.9 %
+180 Elo selfplay, fast tc, unbalanced openings
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

The preliminary results of testing of Leorik 3.1 for the CEGT 40/4 list confirm my previous estimate of +180 Elo, now against a wider range of opponents!

Meanwhile I have updated the Leorik binaries released on github to version 3.1.2 of similar strength but where the internal eval is scaled by a factor before getting printed in the uci output. Outputting the raw eval values makes the engine look very dramatic as scores are about 3x bigger than what you'd expect. That's because training a net is all about minimizing the prediction error and not at all aware of what a centipawn is.

Only after release of 3.1 I learned that most engines scale the internal eval before printing and also how to derive a good scaling factor, namely with the tool: https://github.com/official-stockfish/WDL_model

You can give it a bunch of pgn files containing games that your engine has played and calculate the stats for your engine like this:

Code: Select all

./scoreWDLstat.exe --matchEngine "Leorik-3.1"
Then run

Code: Select all

python .\scoreWDL.py --NormalizeToPawnValue 100
100 is only correct, if you had no scaling before! (otherwise set according to your scaling factor)

...and within seconds you're presented with a scientific looking visualization
Image

...and in the log you'll find a line like const int NormalizeToPawnValue = 306; and while I think it's a bit confusingly named it's the score at which the propability of a win/loss is 50% which is (according to stockfish) when you should evaluate the position as +/- 100 cp! :D
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
lithander
Posts: 912
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

I have trained a number of networks all compatible with Leorik 3.1 binaries. You can download them from github here

Just make sure to place the nnue file you want besides the Leorik binary. Start the binary and confirm from the console output that the nnue file you wanted was loaded correctly!

I have included a graph showing how more neurons improves the accuracy of the networks (y-axis, less loss is better) but also requires a longer training duration (x-axis, more epochs takes longer) and finally I have annotated the graphs with the relative Elo gain of doubling the HL size.

Image
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess