Devlog of Leorik

lithander · Post by **lithander** » Tue Jan 04, 2022 3:39 pm

Unshackled from the constraints of minimalism and simplicity a bare-bones engine rises from it's grave, stronger than ever. All hail Leorik the Skeleton King!

When I started working on MinimalChess about a year ago I remember seeing a lot of devlog threads here were authors reused the same thread to chronicle their development but also discuss questions that were too small and too specific to warrant the opening of dedicated threads. I liked reading these journeys a lot and now that I have finally committed to write a new engine myself I wanted to open one of these threads, too. (I hope you don't mind.)

I'm far from releasing anything. But over the Christmas break I have written a bitboard based perft implementation from scratch. It's pseudo legal, doesn't use hashing or bulk counting (every move is played and then checked for legality) and computes perft on my test-set with 67M nodes per second. For a C# implementation I think it's pretty fast. Now even the most harmless looking code changes, like changing the const int values assigned to identify pieces (within the enum Piece) are causing a regression in speed. Seriously, I have no explanation for that but it's true.

So I think I'm ready to leave the perft-only state behind and reimplement the features of MinimalChess step by step. Hopefully for version 1.0 of Leorik I'll have an engine that plays like it's predecessor but several times faster.

This would be where I plan the first public release but it's still a long way to go. But I hope by committing to that goal publicly by opening this thread I will help myself to stay focused and motivated! =)

algerbrex · Post by **algerbrex** » Tue Jan 04, 2022 4:38 pm

Awesome news Thomas! I'm really glad to see you developing a new engine. I think you'll really enjoy working with bitboards. IMHO, they make implementing certain evaluation features much easier, which is one area I remember you saying you wanted to try to work more on in MinimalChess.

What method are you using to generate slider moves? Magics? Or something like Hyperbola Quintessence or rotated bitboards? Or the ray look-up method?

Also if you don't mind me asking, what does your hardware look like. I'm just curious since I've noticed recently, Blunder seems really slow compared to other engines in its class. On my AMD Ryzen 7 4000, I'm getting about 20M nodes per second, max, for perft(6) from the start position. That is with Zobrist hashing, however, but I'm not sure how much that slows things down.

But 67M nodes per second sounds very nice indeed, good work! I'm excited to see the first release!

lithander · Post by **lithander** » Tue Jan 04, 2022 5:11 pm

algerbrex wrote: ↑Tue Jan 04, 2022 4:38 pm What method are you using to generate slider moves? Magics? Or something like Hyperbola Quintessence or rotated bitboards? Or the ray look-up method?

With so many options to chose from on the wiki I thought I should first start with something original as a challenge to myself. And what I found was pleasently concise and fast and so I stuck with it.

Code: Select all

        
        //returns the index of the least significant bit of the bitboard, bb can't be 0
        public static int LSB(ulong bb) => BitOperations.TrailingZeroCount(bb);

        //resets the least significant bit of the bitboard
        public static ulong ClearLSB(ulong bb) => Bmi1.X64.ResetLowestSetBit(bb);
        //public static ulong ClearLSB(ulong bb) => bb & (bb - 1);

        const ulong HORIZONTAL = 0x00000000000000FFUL;
        const ulong VERTICAL = 0x0101010101010101UL;

        public static ulong GetDiagonalTargets(ulong occupation, int square)
        {
            ulong bbPiece = 1UL << square;
            ulong bbBlocker = occupation & ~bbPiece;
            //mask the bits below bbPiece
            ulong bbBelow = bbPiece - 1;
            return GenLines(Diagonals[square], Diagonals[square+64], bbBlocker, bbBelow);
        }

        public static ulong GetOrthogonalTargets(ulong occupation, int square)
        {
            ulong bbPiece = 1UL << square;
            ulong bbBlocker = occupation & ~bbPiece;
            //mask the bits below bbPiece
            ulong bbBelow = bbPiece - 1;
            //horizontal line through square
            ulong bbHorizontal = HORIZONTAL << (square & 56);
            //vertical line through square
            ulong bbVertical = VERTICAL << (square & 7);
            return GenLines(bbHorizontal, bbVertical, bbBlocker, bbBelow);
        }

        private static ulong GenLines(ulong bbLineA, ulong bbLineB, ulong bbBlocker, ulong bbBelow) =>
            GenLine(bbLineA, bbBlocker & bbLineA, bbBelow) |
            GenLine(bbLineB, bbBlocker & bbLineB, bbBelow);

        private static ulong GenLine(ulong bbLine, ulong bbBlocker, ulong bbBelow)
        {
            //MaskLow sets all low bits up to and including the lowest blocker above orgin, the rest are zeroed out.
            //MaskHigh sets all low bits up to and including the highest blocker below origin, the rest are zerored out.
            //The bits of the line that are different between the two masks are the valid targets (including the first blockers on each side)
            return (MaskLow(bbBlocker & ~bbBelow) ^ MaskHigh(bbBlocker & bbBelow)) & bbLine;
        }

        //identify the highest set bit and shift a mask so the bits below are set and the rest are zeroed
        private static ulong MaskHigh(ulong bb) => 0x7FFFFFFFFFFFFFFFUL >> BitOperations.LeadingZeroCount(bb | 1);

        //identify the lowest set bit and set all bits below while zeroing the rest
        private static ulong MaskLow(ulong bb) => bb ^ (bb - 1);

If you want to try it make sure to inline everything. And the Diagonals array just contains 128 bitboard with the diagonals and antidiagonals on an empty board.

You can compute the diagonals in code just like the orthogonals if you want to be 100% algorithmic but the lookup tables is slightly faster.

Code: Select all

            //diagonal line through square
            ulong bbDiagonal = VerticalShift(DIAGONAL, file - rank);
            //antidiagonal line through square
            ulong bbAntiDiagonal = VerticalShift(ANTIDIAGONAL, 7 - file - rank);

        //sign of 'ranks' decides between left shift or right shift. Then convert signed ranks to a positiver number of bits to shift by. Each rank has 8 bits e.g. 1 << 3 == 8
        private static ulong VerticalShift(in ulong bb, in int ranks) => ranks > 0 ? bb >> (ranks << 3) : bb << -(ranks << 3);

Also if you don't mind me asking, what does your hardware look like.

I use a Ryzen 3600 at 4.2 Ghz. Perft(6) on the startpos runs at 76287K NPS in my implementation.

Zobrist hashing is actually the next thing I'm going to add and I can report back with updated speeds.

algerbrex · Post by **algerbrex** » Tue Jan 04, 2022 5:56 pm

lithander wrote: ↑Tue Jan 04, 2022 5:11 pm With so many options to chose from on the wiki I thought I should first start with something original as a challenge to myself. And what I found was pleasently concise and fast and so I stuck with it.

Interesting, thanks. I may pop that into Blunder just for fun to compare the numbers between it and magic bitboards.

lithander wrote: ↑Tue Jan 04, 2022 5:11 pm I use a Ryzen 3600 at 4.2 Ghz. Perft(6) on the startpos runs at 76287K NPS in my implementation.

Cool. I'll probably look into doing a little tinkering with Blunder this week. Someone else pointed out to me it seemed unusually slow. So thanks for motivation

lithander wrote: ↑Tue Jan 04, 2022 5:11 pm Zobrist hashing is actually the next thing I'm going to add and I can report back with updated speeds.

That'd be nice, thanks, and good luck!

algerbrex · Post by **algerbrex** » Tue Jan 04, 2022 7:40 pm

lithander wrote: ↑Tue Jan 04, 2022 5:11 pm Zobrist hashing is actually the next thing I'm going to add and I can report back with updated speeds.

Just did a test myself. Got about a 14% speed-up with Zobrist hashing removed.

emadsen · Post by **emadsen** » Tue Jan 04, 2022 9:29 pm

lithander wrote: ↑Tue Jan 04, 2022 3:39 pm When I started working on MinimalChess about a year ago I remember seeing a lot of devlog threads here were authors reused the same thread to chronicle their development but also discuss questions that were too small and too specific to warrant the opening of dedicated threads. I liked reading these journeys a lot and now that I have finally committed to write a new engine myself I wanted to open one of these threads, too. (I hope you don't mind.)

But over the Christmas break I have written a bitboard based perft implementation from scratch.

Great news, Thomas. I look forward to reading your posts and tracking the progress of Leorik.

lithander · Post by **lithander** » Wed Jan 05, 2022 12:57 pm

emadsen wrote: ↑Tue Jan 04, 2022 9:29 pm Great news, Thomas. I look forward to reading your posts and tracking the progress of Leorik.

Thanks Erik! I look forward to when I can play against MadChess and actually win a few games!

lithander · Post by **lithander** » Wed Jan 05, 2022 2:57 pm

algerbrex wrote: ↑Tue Jan 04, 2022 7:40 pm
lithander wrote: ↑Tue Jan 04, 2022 5:11 pm Zobrist hashing is actually the next thing I'm going to add and I can report back with updated speeds.
Just did a test myself. Got about a 14% speed-up with Zobrist hashing removed.

I suppose you are computing the zobrist hash incrementally?

I was about to do that. But first I implemented the naive version where I compute the zobrist hash from scratch. It's good to have that to verify that the incremental one is working correctly. My implementation is terribly inefficient, looping over all the bits set in each of the bitboards and if I call it on every node than the perft speed tanks. But why would you ever want to know the hash for leaf nodes? In MinimalChess I'm using the hash for a transposition table but I don't set or query the TT in QSearch and QSearch is where the bulk of the work is spend.

So it made me wonder... when I make the zobrist calculation incremental I'll suddenly also do a lot of extra work in qsearch where I don't ever need it. Maybe I need two PlayMove() methods? One that does an incremental update of the zobrist hash and another one that doesn't and is used in QSearch?

But how important is a speedy incremental zobrist if you are not using it in the leaf nodes? Just exluding the leafs in perft from computing the hash (their perft count is always going to be 1... no point in storing that in a hash table) already makes the overall cost of zobrist-hash calculations pretty negligible. (And if you actually use it to store the perft result of positions you even get a nice speedup to 130M NPS on average on my test set.)

So, is it worth it to make the zobrist hashing incremental? Probably only if I can make it real fast and lightweight or optional so that I can skip it where it's not needed like in qsearch. Stuff gets complicated quickly!

algerbrex · Post by **algerbrex** » Wed Jan 05, 2022 3:52 pm

lithander wrote: ↑Wed Jan 05, 2022 2:57 pm I suppose you are computing the zobrist hash incrementally?

Yup.

lithander wrote: ↑Wed Jan 05, 2022 2:57 pm I was about to do that. But first I implemented the naive version where I compute the zobrist hash from scratch. It's good to have that to verify that the incremental one is working correctly. My implementation is terribly inefficient, looping over all the bits set in each of the bitboards and if I call it on every node than the perft speed tanks. But why would you ever want to know the hash for leaf nodes? In MinimalChess I'm using the hash for a transposition table but I don't set or query the TT in QSearch and QSearch is where the bulk of the work is spend.

So it made me wonder... when I make the zobrist calculation incremental I'll suddenly also do a lot of extra work in qsearch where I don't ever need it. Maybe I need two PlayMove() methods? One that does an incremental update of the zobrist hash and another one that doesn't and is used in QSearch?

Interesting idea! I've heard mixed opinions about querying the hash in QSearch, and it's something I'll have to get around to trying in Blunder.

However, my gut tells me that the speed up you might get from avoiding incrementally updating the zobrist hash in QSearch might not be significant enough to register as any kind of notable gain in strength.

But that's just my gut instinct, I don't have any hard data to support that idea. But I will say to make sure your move generator has the ability to generate only captures in QSearch. For a long time, I was lazy in Blunder and just generated all moves in QSearch and filtered out non-attacking moves. But I remember after I implemented a dedicated move generator method to only generate captures, I got a very nice speed and 30 Elo in playing strength increase.

Now you probably already knew this and are doing it, but stuff like this wasn't so obvious to me when I started

lithander wrote: ↑Wed Jan 05, 2022 2:57 pm But how important is a speedy incremental zobrist if you are not using it in the leaf nodes? Just exluding the leafs in perft from computing the hash (their perft count is always going to be 1... no point in storing that in a hash table) already makes the overall cost of zobrist-hash calculations pretty negligible. (And if you actually use it to store the perft result of positions you even get a nice speedup to 130M NPS on average on my test set.)

So, is it worth it to make the zobrist hashing incremental? Probably only if I can make it real fast and lightweight or optional so that I can skip it where it's not needed like in qsearch. Stuff gets complicated quickly!

So you're saying naively creating the zobrist hash from scratch but excluding this at the leaf nodes is only a negligible speed decrease? Interesting, although that makes sense. Sounds like something to test.

I remember in another thread Marcel stated that when he switched his engine to update the zobrist hash from scratch each time, he got a performance drop of about 67%, so maybe this percentage goes down quite a bit of excluding the leaf nodes?

Mike Sherwin · Post by **Mike Sherwin** » Wed Jan 05, 2022 11:25 pm

lithander wrote: ↑Wed Jan 05, 2022 2:57 pm
algerbrex wrote: ↑Tue Jan 04, 2022 7:40 pm
lithander wrote: ↑Tue Jan 04, 2022 5:11 pm Zobrist hashing is actually the next thing I'm going to add and I can report back with updated speeds.
Just did a test myself. Got about a 14% speed-up with Zobrist hashing removed.
I suppose you are computing the zobrist hash incrementally?

I was about to do that. But first I implemented the naive version where I compute the zobrist hash from scratch. It's good to have that to verify that the incremental one is working correctly. My implementation is terribly inefficient, looping over all the bits set in each of the bitboards and if I call it on every node than the perft speed tanks. But why would you ever want to know the hash for leaf nodes? In MinimalChess I'm using the hash for a transposition table but I don't set or query the TT in QSearch and QSearch is where the bulk of the work is spend.

So it made me wonder... when I make the zobrist calculation incremental I'll suddenly also do a lot of extra work in qsearch where I don't ever need it. Maybe I need two PlayMove() methods? One that does an incremental update of the zobrist hash and another one that doesn't and is used in QSearch?

But how important is a speedy incremental zobrist if you are not using it in the leaf nodes? Just exluding the leafs in perft from computing the hash (their perft count is always going to be 1... no point in storing that in a hash table) already makes the overall cost of zobrist-hash calculations pretty negligible. (And if you actually use it to store the perft result of positions you even get a nice speedup to 130M NPS on average on my test set.)

So, is it worth it to make the zobrist hashing incremental? Probably only if I can make it real fast and lightweight or optional so that I can skip it where it's not needed like in qsearch. Stuff gets complicated quickly!

MakeMove(t, m);
UpdateHash(t, m);
// call search
UpdateHash(t, m);
TakeBack(t, m);

Results were 202,817,680m/22.22s finishing depth 15 and a node rate of 9,127,708 nodes/sec.

MakeMove(t, m);
if (depth > 1) UpdateHash(t, m);
// call search
if (depth > 1) UpdateHash(t, m);
TakeBack(t, m);

Results were 204,152,123m/14.79s finishing depth 11 and a node rate of 13,803,389 nodes / sec.

Devlog of Leorik

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik