Devlog of Leorik

Discussion of chess software programming and technical issues.

Moderator: Ras

Mike Sherwin
Posts: 930
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Devlog of Leorik

Post by Mike Sherwin »

The warmth of life has entered my tomb.

I'm glad your back! :D
op12no2
Posts: 536
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Devlog of Leorik

Post by op12no2 »

lithander wrote: Fri Jan 17, 2025 2:10 pm It's definitely possible to get off the ground without "external" chess knowledge like EGTB or opening books which is really cool.

It seems like it's not really important that the positions you train with are always correctly labeled and scored. (I completely ignore score) But what is important is that the errors are equally balanced. The first net trained on my first batch of training data was showing really odd behaviour and unfit of generating viable training data on it's own. So I kept the temperature high at +/-100cp. Now it was basically a random mover with a tendency towards what the net says. And *that* did the trick.

So from now on the question was: When is the net good enough that I should set the temperature to zero? And I haven't yet found a clear answer. So far the randomness in the training data still seems to do more good than harm.
I'm on generation 6 now via the random weight init 'boot' net and testing successive generation ELO has been (ish) +inf, +inf, +inf, +400, +130, +100. But not tested gen 6 against existing net yet (the others were all -inf). I made the mistake (I think) of leaving lerp at 0.5 but I'll live with it for this experiment. I may restart and use lerp 1.0 (WDL), at least for early generations, if this experiment fails.
User avatar
lithander
Posts: 889
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

So v0 is basically a random mover. The following networks where trained are as follows

V2: 8 hidden layers, trained on 88M positions from v0 (no network) with temperature=100
Score of v2 vs v0: 568 - 7 - 599 [0.739] 1174

V3: 32 hidden layers, trained on 221M positions from v2 with temperature=100
Score of v3 vs v2: 474 - 0 - 7 [0.993] 481

V4: 32 hidden layers, trained on 424M positions from v2 & v3 all with temperature=100
Score of v4 vs v3: 708 - 14 - 22 [0.966] 744

Despite each new net obliterating their predecessor v4 still has no chance against the reference:
Score of Leorik-3.0.11v4 vs Leorik-3.0.10: 1 - 171 - 1 [0.009] 173


V5: 32 hidden layers, trained on 700M positions from v2 - v4, all with temperature=100
Score of v5 vs v4: 326 - 7 - 15 [0.958] 348

Could only get those significant gains after starting to shuffle the positions! Still no chance against the reference:
Score of Leorik-3.0.11v5 vs Leorik-3.0.10: 7 - 662 - 46 [0.042] 715


V6: 32 hidden layers, trained on 479M positions from v3 - v4, all with temperature=100
Score of v6 vs v5: 197 - 42 - 64 [0.756] 303

Tested with 64HL and 128HL for the first time but strength was still in the same ballpark as 32HL!

V7: 32 hidden layers, trained on 570M positions from v3 - v6, all with temperature=100
Score of v7 vs v6: 858 - 668 - 775 [0.541] 2301

V8: 32 hidden layers, trained on 496M positions from v4 - v7, all with temperature=100
Score of v8 vs v7: 698 - 444 - 714 [0.568] 1856

New nets are only marginally better than their predecessor. This net is ~300 Elo weaker than the reference but there are a few wins, too.
Leorik-3.0.11v8 vs Leorik-3.0.10: 42 - 604 - 175 [0.158] 821


V9: 32 hidden layers, trained on 629M positions from v4 - v8, all with temperature=100
Score of v9 vs v8: 546 - 333 - 632 [0.570] 1511

Also generated 44M positions with temperature=0! But neither mixing them in with the 629M high-temp positions nor training on them in isolation helps. So I start to quiesce all positions with SEE like QSearch would, which gives some +40 extra Elo over the unfiltered data

V9-Q5: 32 hidden layers, trained on 642M positions from v4 - v8, temperature=mixed, QSearchDepth=5
Score of v9 vs v8: 3636 - 1586 - 3299 [0.620] 8521

V10: 32 hidden layers, trained on 547M positions from v5 - v9, temperature=mixed, QSearchDepth=5
Score of v10 vs v9: 819 - 558 - 1177 [0.551] 2554

Net strength is only increasing slowly. I train with 60 superbatches instead of 20 now which gives ~10 extra Elo. Still a ~160 Elo gap to the reference which uses a 256 hidden layer net.
Score of Leorik-3.0.11v10 vs Leorik-3.0.10: 169 - 866 - 545 [0.279] 1580
Is 32HL saturated and we should use more hidden layers too?


V11a: 32 hidden layers, trained on 674M positions from v6-v9, QSearchDepth=5, lots of new temp=0 games int he mix!
Score of v11 vs v10: 1150 - 1248 - 2405 [0.490] 4803
V11b: 64 hidden layers, trained on 674M positions as above
Score of v11 vs v10: 831 - 326 - 952 [0.620] 2109
V11c: 128 hidden layers, trained on 674M positions as above
Score v11 vs v10: 4941 - 1059 - 3857 [0.697] 9857

Strength of the 128 hidden layer net is now ~50 Elo below reference with 256 hidden layers!
Score of Leorik-3.0.11v11 vs Leorik-3.0.10: 481 - 909 - 1342 [0.422] 2732


I feel like with more data and eventually transitioning to a 256 hidden layer network I should be able to close the gap to the reference. But I probably need billions of positions. Preliminary results from v12 where I added 300M temp=100 positions did only add ~20 Elo for the 128HL net.
Last edited by lithander on Sun Jan 19, 2025 1:19 am, edited 1 time in total.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
op12no2
Posts: 536
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Devlog of Leorik

Post by op12no2 »

Interesting. I kept the hidden layer size the same (128) in this experiment but am gradually increasing the number of games played.

https://github.com/op12no2/lozza/blob/master/zero.md

Early days...
User avatar
lithander
Posts: 889
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

Mike Sherwin wrote: Sat Jan 18, 2025 4:59 am The warmth of life has entered my tomb.

I'm glad your back! :D
Hi Mike! Your interest in real-time learning is not forgotten. I recently implemented correction history (https://github.com/lithander/Leorik/com ... 11aa72af5a) and while I worked on it I thought that instead of using a small hashtable and discriminate positions only by pawn structure it could also be promising to keep track of correction PSQTs instead!
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
lithander
Posts: 889
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

op12no2 wrote: Sun Jan 19, 2025 1:16 am Interesting. I kept the hidden layer size the same (128) in this experiment but am gradually increasing the number of games played.

https://github.com/op12no2/lozza/blob/master/zero.md

Early days...
Interesting choice to discard the data at each generation. Considering that both quantity and quality matters I have kept positions from multiple generations and phase out the data of old generations as new batches come in. Each generation only changes about ~100M of positions in the pool.

It's less clean but the bottle neck is data generation and I don't want to drag out the process longer than needed. Right now as I have transitioned to 128 hidden layers I add more than I remove to slowly increase the pool. ~500M seems not enough to saturate that net. Also as the strength differences between generations does not change so much the "old" data isn't really that bad anymore, so I feel that your policy of starting each generation with clean data is something that will work very well in the beginning but potentially cost a lot of time in the end!
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
op12no2
Posts: 536
Joined: Tue Feb 04, 2014 12:25 pm
Location: Gower, Wales
Full name: Colin Jenkins

Re: Devlog of Leorik

Post by op12no2 »

When the net get's 'reasonably good' I will probably start keeping data. Presently it's a fairly quick process and I'll have a few more threads soon totalling 86. The only downside is the family are wondering where their laptops have gone :)
User avatar
lithander
Posts: 889
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

op12no2 wrote: Sun Jan 19, 2025 2:20 pm When the net get's 'reasonably good' I will probably start keeping data. Presently it's a fairly quick process and I'll have a few more threads soon totalling 86. The only downside is the family are wondering where their laptops have gone :)
86 threads is impressive! I might have to requisition additional hardware, too, if I want to stay ahead. :lol:
My data generation tool is also multithreaded but somehow setting it up with more threads then I have cores doesn't yield the expected results. While it scales linearly with the first 12 threads (I have a 5900X with 12 cores) going beyond that and utilizing 24 threads only generates 10% more positions per second. (3000 instead of 2800) :thinking:
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Mike Sherwin
Posts: 930
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Devlog of Leorik

Post by Mike Sherwin »

lithander wrote: Sun Jan 19, 2025 11:48 am
Mike Sherwin wrote: Sat Jan 18, 2025 4:59 am The warmth of life has entered my tomb.

I'm glad your back! :D
Hi Mike! Your interest in real-time learning is not forgotten. I recently implemented correction history (https://github.com/lithander/Leorik/com ... 11aa72af5a) and while I worked on it I thought that instead of using a small hashtable and discriminate positions only by pawn structure it could also be promising to keep track of correction PSQTs instead!
You made my day! :D I didn't think anyone heard me. Twelve/thirteen elo today. Twelve hundred elo tomorrow :!:

IMHO the key to +1200 elo will be in playing many shallow searched games before the main search and storing learned scores in the hashfile. Or maybe just adjusting the PSQTs. Bringing results from far away outside of the horizon I believe will be key to maximum elo gain from real-time learning.

In RomiChess I did PSQT adjustment based on the history table back in 2006. It is still in the latest version. I'm just curious if what you did is the same.

Code: Select all

// Knight Tables
    moves = knight[sq] & ~aPieces;
    trgts = knight[sq] & (bRooks | bQueens | bKings);
    wKnightTbl[sq] = PopCnt(moves) * 4 + PopCnt(trgts) * 8;
    wKnightTbl[sq] += ((cenRow + cenRow) * (cenCol + cenCol) + row * 3);
    if(cenCol == 0) wKnightTbl[sq] -= 20;
    g = evalHistTblg[WN][sq]; n = evalHistTbln[WN][sq]; // <- //
    wKnightTbl[sq] += (n > 99 ? g/n/2 : 0); // <- //
User avatar
lithander
Posts: 889
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Devlog of Leorik

Post by lithander »

A 256 hidden layer net trained with a bit over 1B positions from generation v7 to v12 has already managed to beat my "old" 256 hidden layer network that Leorik 3.0 shipped with. That was way quicker than expected! More than half of the training data is using temperature=100. Also the v7 net for example is 400 Elo weaker than my best net. So considering all that my reference net might be quite as good a benchmark as I had thought :oops:

Code: Select all

--> 256HL-1047M-Tmixed-Q5-v13.nnue

Score of Leorik-3.0.11v13 vs Leorik-3.0.10: 2702 - 2285 - 5013  [0.521] 10000
...      Leorik-3.0.11v13 playing White: 1566 - 862 - 2572  [0.570] 5000
...      Leorik-3.0.11v13 playing Black: 1136 - 1423 - 2441  [0.471] 5000
...      White vs Black: 2989 - 1998 - 5013  [0.550] 10000
Elo difference: 14.5 +/- 4.8, LOS: 100.0 %, DrawRatio: 50.1 %
Also it's interesting to note that I didn't have to retune any of the magic constants that are so prevalent in most chess games because I have tried to get rid of all of them so that the engine is a bit more independent from the network it's using for eval purposes.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess