Devlog of Leorik

Mike Sherwin · Post by **Mike Sherwin** » Thu Jun 02, 2022 5:55 pm

lithander wrote: ↑Thu Jun 02, 2022 9:57 am
Mike Sherwin wrote: ↑Wed Jun 01, 2022 10:27 pm Big improvement in playing style! Much more human like. Needs pawn storm code. This was a very interesting game!!
Thanks for playing the new version and glad to hear I'm making some progress in the direction of style!

A pawn storm.... is that what you did on the King Side? Moving a phalanx of pawns together? Leorik should have countered that better, you mean?

Yes, but also know when to launch one itself. It is usually a castle on opposite sides thing but in recent years top players are launching their G and H pawns at the enemy king position even if castled on the same side. How to teach a chess engine to do that sounds hard to do though.

dangi12012 · Post by **dangi12012** » Fri Jun 03, 2022 10:22 pm

Quick and dirty results from this thread:
http://www.talkchess.com/forum3/viewtop ... &start=100

Quote from different thread:

lithander wrote: ↑Wed May 18, 2022 8:55 pm
dangi12012 wrote: ↑Wed May 18, 2022 6:47 pm Even tho most of the time is spent in recusing through EvaluateTT a performance difference of 10% is significant and will yield a measurable elo difference.
Oh, you use Leorik! Then I have a few tips: There's a project called Leorik.Test and if you run that it will spend the same time budget (e.g. 100ms) on 300 different positions. Afterwards it will print the average NPS. I use that as a benchmark for changes that should not affect anything but the speed. You can run it from Visual Studio and the results are consistent enough but it's usually a few thousand NPS faster when you make a build.

Only when you change the search/eval of a real engine-vs-engine tournament would be necessary. I use cutechess-cli for that.

If you want to profile the move generation code in isolation I recommend the Leorik.Perft project. It's just move generation and nothing else and instead of the 6M nps I get in the engine it's more like 60M nps there. The performance implications of using PInvoke for sliders should be much more visible there!

Yields this results:
DEFAULT

Code: Select all

Searched 300 positions for 100ms each.
211098K nodes visited. Took 30,012 seconds!
7033K NPS.
Best move found in 83 / 300 positions!

Searched 300 positions for 100ms each.
210592K nodes visited. Took 30,012 seconds!
7017K NPS.
Best move found in 83 / 300 positions!

PEXT

Code: Select all

Searched 300 positions for 100ms each.
222767K nodes visited. Took 30,024 seconds!
7419K NPS.
Best move found in 83 / 300 positions!

Searched 300 positions for 100ms each.
220205K nodes visited. Took 30,02 seconds!
7335K NPS.
Best move found in 83 / 300 positions!

Just perft:

Code: Select all

LEOR: Total: 1361558651 Nodes, 18347ms, 74211K NPS
PEXT: Total: 1361558651 Nodes, 15970ms, 85253K NPS

If this is reproducable on multiple machines (tcec hardware would be fast pext compatible for instance) - its a good 5-15% increase in nps.
I suspect similar results for the C# port of Hyperbola qsc.
Now I gotta add here that i am beyond addicted to chessprogramming atm - which is not a good thing.

algerbrex · Post by **algerbrex** » Sat Jun 04, 2022 2:47 am

dangi12012 wrote: ↑Fri Jun 03, 2022 10:22 pm Now I gotta add here that i am beyond addicted to chessprogramming atm...

Well you're in good company than

dangi12012 · Post by **dangi12012** » Sun Jun 05, 2022 9:28 pm

Hey lithander:
Update on automatic cutechess-cli commit strength delta Assessment:
Working on it - promising simple command line in use already. But probably it will need a lot of workers to get the standard deviation low enogh to resolve one elo. +20 means the winrate shifts by 2.88% to 52.88%
+1 is just 0.14% delta or 0.0014 - Meaning that 3000 games are a good start.

Update on Movegen:
I retested with latest C# findings from this thread: http://www.talkchess.com/forum3/viewtop ... &start=110
and these are the results:

Default:

Code: Select all

Leorik Perft

1 OK! 1521ms, 78230K NPS
2 OK! 2423ms, 79907K NPS
3 OK! 2191ms, 81506K NPS
4 OK! 10039ms, 70328K NPS
5 OK! 0ms, 64065K NPS
6 OK! 2040ms, 80406K NPS

Total: 1361558651 Nodes, 18218ms, 74736K NPS

Fancy Magic

Code: Select all

Leorik Perft

1 OK! 1307ms, 91035K NPS
2 OK! 2052ms, 94359K NPS
3 OK! 1926ms, 92709K NPS
4 OK! 8225ms, 85836K NPS
5 OK! 0ms, 77390K NPS
6 OK! 1717ms, 95552K NPS

Total: 1361558651 Nodes, 15230ms, 89396K NPS

I dont want you to feel pressured to do anything - you are the author of the engine and you decide what you put into the sourcecode.
I am just playing around with C# a bit and its always fun to optimize for me personally. Here you have a nice free 20% bump to nps.
Have a nice Pfingstmontag.

dangi12012 · Post by **dangi12012** » Mon Jun 06, 2022 1:46 am

Update 2:
Further performance improvements. I really had to circumvent most things C# does by default in the name of performance.
This comes from improvements from language perspective: (so nothing algorithmic at all)

Code: Select all

Leorik Perft
1 OK! 1121ms, 106147K NPS
2 OK! 1764ms, 109788K NPS
3 OK! 1782ms, 100202K NPS
4 OK! 6973ms, 101243K NPS
5 OK! 0ms, 74083K NPS
6 OK! 1483ms, 110607K NPS

Total: 1361558651 Nodes, 13126ms, 103726K NPS

With this it now solves more position in the 100ms allowed.

Code: Select all

Searched 300 positions for 100ms each.
246146K nodes visited. Took 30,012 seconds!
8201K NPS.
Best move found in 86 / 300 positions!

Greetings -
Daniel

lithander · Post by **lithander** » Mon Jun 06, 2022 1:09 pm

dangi12012 wrote: ↑Mon Jun 06, 2022 1:46 am Further performance improvements. I really had to circumvent most things C# does by default in the name of performance.
This comes from improvements from language perspective: (so nothing algorithmic at all)
Code: Select all
Leorik Perft
1 OK! 1121ms, 106147K NPS
2 OK! 1764ms, 109788K NPS
3 OK! 1782ms, 100202K NPS
4 OK! 6973ms, 101243K NPS
5 OK! 0ms, 74083K NPS
6 OK! 1483ms, 110607K NPS

Total: 1361558651 Nodes, 13126ms, 103726K NPS

That's impressive! My ambition with Leorik is to write from scratch a strong C# engine (maybe the strongest? is that MadChess currently?) based on only idiomatic C# code, fully embracing the managed nature of the language (no unsafe code etc) and using only intrinsics that have a fall-back implementation to offer support for as many 64bit computers as possible.
And the reason I didn't chose a movegen approach like magics that relies on big lookup tables was that I feared that in an actual engine this would put too much load on the CPU caches. But maybe that's not a problem... I can only know when the engine has matured a bit more.

But I have already separated the Perft project from the Engine project in the solution and maybe it makes sense to include your uncompromising approach to maximum performance for reference purposes at the very least. I'm very curious to learn what exactly you had to do to achieve these speeds!

dangi12012 wrote: ↑Mon Jun 06, 2022 1:46 am With this it now solves more position in the 100ms allowed.
Code: Select all
Searched 300 positions for 100ms each.
246146K nodes visited. Took 30,012 seconds!
8201K NPS.
Best move found in 86 / 300 positions!

Improving the NPS from 7M to 8M in the actual engine just by virtue of faster movegen is no small feat! Or did you change anything else?

What is odd though is that the number of solved positions is 265/300 at 100ms on my system at the moment. I just ran it again to confirm. 86/300 looks like something important broke along the way.

dangi12012 wrote: ↑Sun Jun 05, 2022 9:28 pm I dont want you to feel pressured to do anything - you are the author of the engine and you decide what you put into the sourcecode.
I am just playing around with C# a bit and its always fun to optimize for me personally. Here you have a nice free 20% bump to nps.

I appreciate that stance! It's exactly what I hoped to communicate by making the repository public but not putting the code under any open source license. Feel free to look and experiment but I'd rather not have Leorik derivatives be published, outpacing my own development and forcing my hand. For a personal project having full creative control and all the time I wish to take is important to me. That's a very refreshing difference to the projects at work!

dangi12012 · Post by **dangi12012** » Tue Jun 07, 2022 9:48 pm

Github actions rock!
https://docs.github.com/en/actions/quickstart

I forked your repo privately to get started with this
(I wanna do more optimisations)

Writing to a special file can create a build summary automatically for every commit:

Code: Select all

string env = Environment.GetEnvironmentVariable("GITHUB_STEP_SUMMARY");
Console.WriteLine($"Step Summary {env}");
try
{
      File.AppendAllText(env, $"### {perfMessage} :rocket:\n");
}

Custom annotations:
https://docs.github.com/en/actions/usin ... ub-actions

The important part is that these workers can be setup with a single line of code on ubuntu, macos or windows.
For me for example - the worker is setup on a seperate testing machine that has a stable core clock - and nothing else is running.
This gets extremely consistent results - and github itself will send an email if there is a regression or similar for a commit.

Also testing if a commit will be faster on ARM, MacOS or similar is very easy with this setup.

Normally there would be many more steps for example building release packages and publishing them etc - but this is not enterprise software so this is not needed.

dangi12012 · Post by **dangi12012** » Thu Jun 09, 2022 12:12 am

Hey it took some time to create a pointer version. It was an interesting excercise since there are many ways to allocate memory other than arrays in C#
(stackalloc, Marshal.AllocHGlobal, MemoryBuffer, Memory, Span, Buffer, NativeMemory, CompilerServices.Unsafe as well as pinvoke) and all with different pros and cons.

Some classes live in arrays and cannot be structs - so not all code can become pointer arithmetic.

This is the speed delta when making everything unsafe (which is not recommended):
Pointer Version: https://1drv.ms/u/s!AoDIyNlDG2QTg8tIm6R ... A?e=mPWK2p
Safe Version: https://1drv.ms/u/s!AoDIyNlDG2QTg8tL281 ... A?e=JiEge7

I tip my hat to Thomas Jahn.

algerbrex · Post by **algerbrex** » Thu Jun 09, 2022 2:48 am

lithander wrote: ↑Mon Mar 28, 2022 6:43 pm And one iteration would take less than a second to compute...

Hmmm? Do you mean one full training epoch?

Just wondering because I finished implementing a rough sketch of a gradient descent tuner for Blunder this afternoon, and while it seems to be working okay-ish, computing the gradient is slow as hell.

Are you using a numerical approximation of the gradient? I sat down and derived the formula for computing partial derivatives of the cost function needed to compute the full gradient vector by hand, since of course, the gradient is just a vector of the partials of the inputs. And while the formula I got works, it means that to compute each partial for each weight, the whole number of positions has to be iterated through. Using a mini-batch approach speed things up a good bit, but not enough. 2000 iterations would take 2000 years

Are you using vanilla gradient descent or another flavor?

clayt · Post by **clayt** » Thu Jun 09, 2022 8:12 pm

Doing one epoch in under a second should be well within reason - I have a gradient descent tuner with 777 data points per board, and it does a full epoch of ~30M positions in about half a second. I'm using roughly the same approach as Leorik for my personal engine (using sigmoids to make error in strongly winning and losing positions not matter so much). Maybe there's a mistake in your computation of the gradient which makes it more convoluted than it has to be?

I strongly recommend using a sparse representation of each datapoint, as usually only 30 or so points in any one position will be nonzero.

Devlog of Leorik

Re: Devlog of Leorik - New Version 2.1

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik