Why C++ instead of C#?

Discussion of chess software programming and technical issues.

Moderator: Ras

spirch
Posts: 95
Joined: Fri Nov 09, 2012 12:36 am

Re: Why C++ instead of C#?

Post by spirch »

ok i cannot use these

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenRook(int sq, ulong occupation)
        {
            var lMask = LateralMasks[sq];
            ulong index = Bmi2.X64.ParallelBitExtract(occupation, lMask);
            return Bmi2.X64.ParallelBitDeposit(AttackMapLateral[(sq * (128 * 256)) + (int)index], lMask);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenBishop(int sq, ulong occupation)
        {
            var dMask = DiagonalMasks[sq];
            ulong index = Bmi2.X64.ParallelBitExtract(occupation, dMask);
            return Bmi2.X64.ParallelBitDeposit(AttackMapDiag[(sq * (64 * 256)) + (int)index], dMask);
        }
i need to swap to the GenRook2 and GenBishop2

Code: Select all

Total: 1361558651 Nodes, 9044ms, 150543K NPS
:D good night now
spirch
Posts: 95
Joined: Fri Nov 09, 2012 12:36 am

Re: Why C++ instead of C#?

Post by spirch »

ok ok now bed time

@krunch

i redid some of my few optimization in your code (not all of them since i realllllly need to go to bed now)

Code: Select all

Total: 1361558651 Nodes, 7795ms, 174667K NPS
little improvement haha 8-)

going to continue tomorrow and post my changes
R. Tomasi
Posts: 307
Joined: Wed Sep 01, 2021 4:08 pm
Location: Germany
Full name: Roland Tomasi

Re: Why C++ instead of C#?

Post by R. Tomasi »

klx wrote: Wed Sep 29, 2021 6:32 am Actually, let me rethink that.

If Tomasi ports it to C++, no-one will be able to read the code. Someone else please.
:lol: guilty as charged.
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Why C++ instead of C#?

Post by lithander »

krunch wrote: Wed Sep 29, 2021 4:48 am I decided to see how fast I could make the C# version.
  1. Illegal move checking in the current implementation takes 60% of the execution time, however we only really need to call it if we are in check, moving the king, or moving a pinned piece. These can be calculated per position rather than per move cutting the execution time in half.
  2. Created separate methods per piece type for quiet / capture move generation. (A single method with unsafe function pointers was an improvement but about 200ms slower)
  3. Cleaned up the make method
  4. Added a flag to MoveType for whether to perform the illegal move call (in check / pinned / king / EP) during move generation
  5. Combine quiet / capture move generation.
  6. Added uncompressed PEXT move list for bishops/rooks
  7. Change TMove to a struct union using FieldLayout explicit + a few other minor tweaks
  8. A quick multithreaded implementation which fires each move from the starting position as a separate Task
All changes were compared using BenchmarkDotNet to reduce the variability between runs:

Code: Select all

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1110 (2004/May2020Update/20H1)
Intel Core i5-6600K CPU 3.50GHz (Skylake), 1 CPU, 4 logical and 4 physical cores
.NET SDK=5.0.401
  [Host]   : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT
  .NET 5.0 : .NET 5.0.10 (5.0.1021.41214), X64 RyuJIT

Job=.NET 5.0  Runtime=.NET 5.0

|                          Method |     Mean |    Error |   StdDev |         NPS |
|-------------------------------- |---------:|---------:|---------:|------------ |
|                      C_Baseline | 20.565 s | 0.1174 s | 0.1041 s |  66207K NPS |
|              CSharpV14_Baseline | 37.739 s | 0.0185 s | 0.0154 s |  36078K NPS |
|                 Spirch_Baseline | 30.626 s | 0.0101 s | 0.0089 s |  44457K NPS |
|               CSharpV14_Illegal | 16.644 s | 0.0065 s | 0.0054 s |  81806K NPS |
|      CSharpV14_PieceMoveMethods | 15.196 s | 0.0089 s | 0.0079 s |  89601K NPS |
|                  CSharpV14_Make | 14.716 s | 0.0192 s | 0.0170 s |  92519K NPS |
|       CSharpV14_MoveIllegalFlag | 13.190 s | 0.0099 s | 0.0087 s | 103225K NPS |
| CSharpV14_CombineQuietsCaptures | 11.240 s | 0.0258 s | 0.0215 s | 121131K NPS |
|                  CSharpV14_PEXT |  9.131 s | 0.0107 s | 0.0095 s | 149119K NPS |
|       CSharpV14_PEXTStructUnion |  8.666 s | 0.0183 s | 0.0171 s | 157119K NPS |
|    CSharpV14_PEXT_MultiThreaded |  3.368 s | 0.0669 s | 0.1289 s | 404307K NPS |
That's super impressive! And welcome to this little community project. ;)

I haven't had time to review your code yet and I really look forward to do this at a later time so maybe what I say now is baseless... but it sounds like you have turned a pseudo-legal move generator into one that only generates legal moves? In that case I think it becomes a bit of a problem regarding to the goal I originally had with porting an existing C move generator to C#. My intention was to compare the speed of a C with C# implementation that is solving the exact same problem. Comparing apples with apples. So that we can finally settle the age-old debate of how much slower C# really is than C in chess programming workloads.

It's not a surprise that legal move generators reach higher perft speed than pseudo-legal ones but in an actual chess engine that speed advantage becomes smaller again because you wouldn't have to check all moves for legality but only those that you actually want to play. So many engines use pseudo-legal move generators. I think both approaches are valid, it's not clear that the C version was missing a feature and we should add it to the C and C# version. The same is true when we talk about multithreading. Or to a lesser degree regarding the use of PEXT.

Up to this point all changes I made to the C# version were only performance improvements. This is also the reason why the same changes were not helpful for the C version because the superior compiler was already doing the same or better optimizations there. We basically did manually in C# what in the C version is done already by the compiler: Optimizing the code to make it faster.

I don't want to stop anybody from making significant algorithmic changes. I find that as interesting and worthwhile as everybody else! But in that case I think we should consider to keep the versions separated AND we should also make an analogue version in C that's receiving the same changes so that we have again a C and a C# implementation that is algorithmically identical and can be used to compare the speed of C# and C.
Last edited by lithander on Wed Sep 29, 2021 2:29 pm, edited 5 times in total.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Why C++ instead of C#?

Post by lithander »

klx wrote: Wed Sep 29, 2021 4:29 am
R. Tomasi wrote: Wed Sep 29, 2021 12:33 am I modified the C source code to reflect the changes of the latest C# version (early outs in Illegal).
Does that actually help though? Only 4% of moves are illegal, so you almost always have to do all calculations anyway, and now you have to do extra work to check for early exit.

I tried it early on, and found that it led to a slight drop in performance (except for one of the positions). I see the same with the version you posted.
You don't have to do all calculations anyway otherwise there'd be not such a significant improvement in C#. (more than one second faster)

By first computing the mask and making sure that the mask is not zero you can often skip computing the 2nd part of the expression where you check the mask against the actual moves. GenBishop() and GenRook especially are not cheap and it's worthwile to skip calculating those when you don't need them because when you would AND them with 0 the result cant be greater 0.

This is just another example of an expression that the C compiler had already optimized but the C# compiler didn't. And so I had do that manually. If a & b == 0 I'm done and the if-condition will evaluate to false and I can "early out" of the if-condition, not the entire function. This happens a lot, not just in 4% of the moves.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Why C++ instead of C#?

Post by lithander »

The source as posted by krunch using Bmi2: 23734ms, 57365K NPS
Using the old way of generating sliders GenRook2 & GenBishop2: Nodes, 9575ms, 142198K NPS

Zen2 sucks! :P
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
spirch
Posts: 95
Joined: Fri Nov 09, 2012 12:36 am

Re: Why C++ instead of C#?

Post by spirch »

not a lot of changes but enough to drop a little bit more than a second

no BMI2, BMI1 only

@krunch
@lithander

Code: Select all

/*
 This perft implementation is based on QBBEngine by Fabio Gobbato and ported to C# by Thomas Jahn
 
 The purpose is to compare the speed differences of C# and C in chess-programming workload.
*/

using System;
using System.Buffers.Binary;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics.X86;
using System.Text;

namespace QBBLatest
{
    public class QbbPerft
    {
        const int MAX_PLY = 32;
        const int WHITE = 0;
        const int BLACK = 8;

        /* define the move type, for example
           KING|CASTLE is a castle move
           PAWN|CAPTURE|EP is an enpassant move
           PAWN|PROMO|CAPTURE is a promotion with a capture */

        /* define the piece type: empty, pawn, knight, bishop, rook, queen, king */
        [Flags]
        public enum TPieceType : byte { EMPTY, PAWN, KNIGHT, BISHOP, ROOK, QUEEN, KING, PIECE_MASK = 0x07, CASTLE = 0x40, PROMO = 0x20, EP = 0x10, CAPTURE = 0x08, CHECKMOVEILLEGAL = 0x80 }

        /* move structure */
        [StructLayout(LayoutKind.Explicit)]
        public struct TMove
        {
            [FieldOffset(0)]
            public int Move;
            [FieldOffset(0)]
            public TPieceType MoveType;
            [FieldOffset(1)]
            public byte From;
            [FieldOffset(2)]
            public byte To;
            [FieldOffset(3)]
            public TPieceType Promotion;
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int NewMove(TPieceType MoveType, byte From, byte To) => (byte)MoveType | (From << 8) | (To << 16);

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int NewMovePromo(TPieceType MoveType, byte From, byte To, TPieceType Promotion) => (byte)MoveType | (From << 8) | (To << 16) | ((byte)Promotion << 24);

        static TMove[][] MovesLists;

        static QbbPerft()
        {
            MovesLists = new TMove[MAX_PLY][];
            for (int i = 0; i < MAX_PLY; i++)
            {
                MovesLists[i] = new TMove[225];
            }

            PreComputePEXTAttacks();
        }

        /*
        Board structure definition
        PM,P0,P1,P2 are the 4 bitboards that contain the whole board
        PM is the bitboard with the side to move pieces
        P0,P1 and P2: with these bitboards you can obtain every type of pieces and every pieces combinations.
        */
        public struct TBoard
        {
            public ulong PM;
            public ulong P0;
            public ulong P1;
            public ulong P2;
            public byte CastleFlags; /* ..sl..SL  short long opponent SHORT LONG side to move */
            public byte EnPassant; /* enpassant column, =8 if not set */
            public byte STM; /* side to move */
        }

        static TBoard[] Game = new TBoard[MAX_PLY];
        static int iPosition;
        private static TBoard Position;

        public static ulong[] LateralMasks;
        public static ulong[] LateralMasksF;
        public static ulong[] LateralMasksR;
        public static ulong[] DiagonalMasks;
        public static ulong[] DiagonalMasksD;
        public static ulong[] DiagonalMasksA;

        public static ushort[] AttackMapDiag;
        public static ushort[] AttackMapLateral;

        public static readonly ulong[] RankMaskBase = { 0x00000000000000FF, 0x000000000000FF00, 0x0000000000FF0000, 0x00000000FF000000, 0x000000FF00000000, 0x0000FF0000000000, 0x00FF000000000000, 0xFF00000000000000 };
        public static readonly ulong[] FileMaskBase = { 0x0101010101010101, 0x0202020202020202, 0x0404040404040404, 0x0808080808080808, 0x1010101010101010, 0x2020202020202020, 0x4040404040404040, 0x8080808080808080 };
        public static readonly ulong[] DiagonalMaskBase = { 0x0000000000000001, 0x0000000000000102, 0x0000000000010204, 0x0000000001020408, 0x0000000102040810, 0x0000010204081020, 0x0001020408102040, 0x0102040810204080, 0x0204081020408000, 0x0408102040800000, 0x0810204080000000, 0x1020408000000000, 0x2040800000000000, 0x4080000000000000, 0x8000000000000000 };
        public static readonly ulong[] AntiDiagonalMaskBase = { 0x0000000000000080, 0x0000000000008040, 0x0000000000804020, 0x0000000080402010, 0x0000008040201008, 0x0000804020100804, 0x0080402010080402, 0x8040201008040201, 0x4020100804020100, 0x2010080402010000, 0x1008040201000000, 0x0804020100000000, 0x0402010000000000, 0x0201000000000000, 0x0100000000000000 };

        public static void PreComputePEXTAttacks()
        {
            LateralMasks = new ulong[65];
            LateralMasksF = new ulong[65];
            LateralMasksR = new ulong[65];
            DiagonalMasks = new ulong[65];
            DiagonalMasksD = new ulong[65];
            DiagonalMasksA = new ulong[65];
            AttackMapDiag = new ushort[64 * 64 * 256];
            AttackMapLateral = new ushort[64 * 128 * 256];

            for (int sq = 0; sq < 64; sq++)
            {
                int diagOffset = (sq / 8) + (sq % 8);
                int antiDiagOffset = (sq / 8) + 7 - (sq % 8);
                LateralMasksF[sq] = FileMaskBase[sq % 8];
                LateralMasksR[sq] = RankMaskBase[sq / 8];
                LateralMasks[sq] = FileMaskBase[sq % 8] | RankMaskBase[sq / 8];
                DiagonalMasksD[sq] = DiagonalMaskBase[diagOffset];
                DiagonalMasksA[sq] = AntiDiagonalMaskBase[antiDiagOffset];
                DiagonalMasks[sq] = DiagonalMaskBase[diagOffset] | AntiDiagonalMaskBase[antiDiagOffset];

                for (int occupiedOne = 0; occupiedOne < 256; occupiedOne++)
                {
                    for (int occupiedTwo = 0; occupiedTwo < 256; occupiedTwo++)
                    {
                        var dMask = DiagonalMasks[sq];
                        var lMask = LateralMasks[sq];

                        ulong aPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedOne, AntiDiagonalMaskBase[antiDiagOffset]) | (1UL << sq);
                        ulong dPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedTwo, DiagonalMaskBase[diagOffset]) | (1UL << sq);
                        ulong rPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedOne, RankMaskBase[sq / 8]) | (1UL << sq);
                        ulong fPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedTwo, FileMaskBase[sq % 8]) | (1UL << sq);

                        ulong index = Bmi2.X64.ParallelBitExtract((aPosition | dPosition), dMask);
                        ulong index2 = Bmi2.X64.ParallelBitExtract((rPosition | fPosition), lMask);

                        ulong dVal = Bmi2.X64.ParallelBitExtract(GenBishop(sq, aPosition | dPosition), dMask);
                        ulong lVal = Bmi2.X64.ParallelBitExtract(GenRook(sq, rPosition | fPosition), lMask);

                        AttackMapDiag[sq * (64 * 256) + (int)index] = (ushort)dVal;
                        AttackMapLateral[sq * (128 * 256) + (int)index2] = (ushort)lVal;
                    }
                }
            }
        }

        /* array of bitboards that contains all the knight destination for every square */
        public static readonly ulong[] KnightDest = {
            0x0000000000020400UL,0x0000000000050800UL,0x00000000000a1100UL,0x0000000000142200UL,
            0x0000000000284400UL,0x0000000000508800UL,0x0000000000a01000UL,0x0000000000402000UL,
            0x0000000002040004UL,0x0000000005080008UL,0x000000000a110011UL,0x0000000014220022UL,
            0x0000000028440044UL,0x0000000050880088UL,0x00000000a0100010UL,0x0000000040200020UL,
            0x0000000204000402UL,0x0000000508000805UL,0x0000000a1100110aUL,0x0000001422002214UL,
            0x0000002844004428UL,0x0000005088008850UL,0x000000a0100010a0UL,0x0000004020002040UL,
            0x0000020400040200UL,0x0000050800080500UL,0x00000a1100110a00UL,0x0000142200221400UL,
            0x0000284400442800UL,0x0000508800885000UL,0x0000a0100010a000UL,0x0000402000204000UL,
            0x0002040004020000UL,0x0005080008050000UL,0x000a1100110a0000UL,0x0014220022140000UL,
            0x0028440044280000UL,0x0050880088500000UL,0x00a0100010a00000UL,0x0040200020400000UL,
            0x0204000402000000UL,0x0508000805000000UL,0x0a1100110a000000UL,0x1422002214000000UL,
            0x2844004428000000UL,0x5088008850000000UL,0xa0100010a0000000UL,0x4020002040000000UL,
            0x0400040200000000UL,0x0800080500000000UL,0x1100110a00000000UL,0x2200221400000000UL,
            0x4400442800000000UL,0x8800885000000000UL,0x100010a000000000UL,0x2000204000000000UL,
            0x0004020000000000UL,0x0008050000000000UL,0x00110a0000000000UL,0x0022140000000000UL,
            0x0044280000000000UL,0x0088500000000000UL,0x0010a00000000000UL,0x0020400000000000UL,
        };

        /* The same for the king */
        public static readonly ulong[] KingDest = {
            0x0000000000000302UL,0x0000000000000705UL,0x0000000000000e0aUL,0x0000000000001c14UL,
            0x0000000000003828UL,0x0000000000007050UL,0x000000000000e0a0UL,0x000000000000c040UL,
            0x0000000000030203UL,0x0000000000070507UL,0x00000000000e0a0eUL,0x00000000001c141cUL,
            0x0000000000382838UL,0x0000000000705070UL,0x0000000000e0a0e0UL,0x0000000000c040c0UL,
            0x0000000003020300UL,0x0000000007050700UL,0x000000000e0a0e00UL,0x000000001c141c00UL,
            0x0000000038283800UL,0x0000000070507000UL,0x00000000e0a0e000UL,0x00000000c040c000UL,
            0x0000000302030000UL,0x0000000705070000UL,0x0000000e0a0e0000UL,0x0000001c141c0000UL,
            0x0000003828380000UL,0x0000007050700000UL,0x000000e0a0e00000UL,0x000000c040c00000UL,
            0x0000030203000000UL,0x0000070507000000UL,0x00000e0a0e000000UL,0x00001c141c000000UL,
            0x0000382838000000UL,0x0000705070000000UL,0x0000e0a0e0000000UL,0x0000c040c0000000UL,
            0x0003020300000000UL,0x0007050700000000UL,0x000e0a0e00000000UL,0x001c141c00000000UL,
            0x0038283800000000UL,0x0070507000000000UL,0x00e0a0e000000000UL,0x00c040c000000000UL,
            0x0302030000000000UL,0x0705070000000000UL,0x0e0a0e0000000000UL,0x1c141c0000000000UL,
            0x3828380000000000UL,0x7050700000000000UL,0xe0a0e00000000000UL,0xc040c00000000000UL,
            0x0203000000000000UL,0x0507000000000000UL,0x0a0e000000000000UL,0x141c000000000000UL,
            0x2838000000000000UL,0x5070000000000000UL,0xa0e0000000000000UL,0x40c0000000000000UL
        };

        /* masks for finding the pawns that can capture with an enpassant (in move generation) */
        static readonly ulong[] EnPassant = {
            0x0000000200000000UL,0x0000000500000000UL,0x0000000A00000000UL,0x0000001400000000UL,
            0x0000002800000000UL,0x0000005000000000UL,0x000000A000000000UL,0x0000004000000000UL
        };

        /* masks for finding the pawns that can capture with an enpassant (in make move) */
        static readonly ulong[] EnPassantM = {
            0x0000000002000000UL,0x0000000005000000UL,0x000000000A000000UL,0x0000000014000000UL,
            0x0000000028000000UL,0x0000000050000000UL,0x00000000A0000000UL,0x0000000040000000UL
        };

        /*
        reverse a bitboard:
        A bitboard is an array of byte: Byte0,Byte1,Byte2,Byte3,Byte4,Byte5,Byte6,Byte7
        after this function the bitboard will be: Byte7,Byte6,Byte5,Byte4,Byte3,Byte2,Byte1,Byte0
        The board is saved always with the side to move in the low significant bits of the bitboard, so this function
        is used to change the side to move
        */
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        public static ulong RevBB(ulong bb) => BinaryPrimitives.ReverseEndianness(bb);

        /* return the index of the most significant bit of the bitboard, bb must always be !=0 */
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        public static ulong MSB(ulong bb) => 63 ^ Lzcnt.X64.LeadingZeroCount(bb);

        /* return the index of the least significant bit of the bitboard, bb must always be !=0 */
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        public static ulong LSB(ulong bb) => Bmi1.X64.TrailingZeroCount(bb);

        /* extract the least significant bit of the bitboard */
        public static ulong ExtractLSB(ulong bb) => Bmi1.X64.ExtractLowestSetBit(bb);

        /* reset the least significant bit of bb */
        public static ulong ClearLSB(ulong bb) => Bmi1.X64.ResetLowestSetBit(bb);

        /* Macro to check and reset the castle rights:
           CastleSM: short castling side to move
           CastleLM: long castling side to move
           CastleSO: short castling opponent
           CastleLO: long castling opponent
         */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static bool CanCastleSM() => (Position.CastleFlags & 0x02) != 0;
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static bool CanCastleLM() => (Position.CastleFlags & 0x01) != 0;
        static void ResetCastleSM() => Position.CastleFlags &= 0xFD;
        static void ResetCastleLM() => Position.CastleFlags &= 0xFE;
        static void ResetCastleSO() => Position.CastleFlags &= 0xDF;
        static void ResetCastleLO() => Position.CastleFlags &= 0xEF;

        /* these Macros are used to calculate the bitboard of a particular kind of piece
           P2 P1 P0
            0  0  0    empty
            0  0  1    pawn
            0  1  0    knight
            0  1  1    bishop
            1  0  0    rook
            1  0  1    queen
            1  1  0    king
        */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Occupation(ref TBoard position) => position.P0 | position.P1 | position.P2; /* board occupation */

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Pawns(ref TBoard position) => Bmi1.X64.AndNot(position.P2, Bmi1.X64.AndNot(position.P1, position.P0));

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Knights(ref TBoard position) => Bmi1.X64.AndNot(position.P0, Bmi1.X64.AndNot(position.P2, position.P1));

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Bishops(ref TBoard position) => position.P0 & position.P1;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Rooks(ref TBoard position) => Bmi1.X64.AndNot(position.P0, Bmi1.X64.AndNot(position.P1, position.P2));

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Queens(ref TBoard position) => position.P0 & position.P2;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong QueenOrRooks(ref TBoard position) => Bmi1.X64.AndNot(position.P1, position.P2);

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong QueenOrBishops(ref TBoard position) => position.P0 & (position.P2 | position.P1);

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Kings(ref TBoard position) => position.P1 & position.P2; /* a bitboard with the 2 kings */

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong SideToMove(ref TBoard position) => position.PM;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static byte EnPass(ref TBoard position) => position.EnPassant;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Opposing(ref TBoard position) => position.PM ^ (position.P0 | position.P1 | position.P2);

        /* get the piece type giving the square */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Piece(int sq) => ((Position.P2 >> sq) & 1) << 2 |
                                              ((Position.P1 >> sq) & 1) << 1 |
                                              ((Position.P0 >> sq) & 1);

        /* calculate the square related to the opponent */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static int OppSq(int sp) => sp ^ 56;
        /* Absolute Square, we need this macro to return the move in long algebric notation  */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static int AbsSq(int sq, int col) => col == WHITE ? sq : OppSq(sq);

        /* get the corresponding string to the given move  */
        static string MoveToStr(TMove move, byte tomove)
        {
            Span<char> promo = stackalloc[] { ' ', ' ', 'n', 'b', 'r', 'q' };
            StringBuilder result = new StringBuilder(6);
            result.Append((char)('a' + AbsSq(move.From, tomove) % 8));
            result.Append((char)('1' + AbsSq(move.From, tomove) / 8));
            result.Append((char)('a' + AbsSq(move.To, tomove) % 8));
            result.Append((char)('1' + AbsSq(move.To, tomove) / 8));
            result.Append(promo[(byte)move.Promotion]);
            return result.ToString();
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        static void ChangeSide()
        {
            ref TBoard position = ref Position;
            position.PM = BinaryPrimitives.ReverseEndianness(position.PM ^ Occupation(ref position));
            position.P0 = BinaryPrimitives.ReverseEndianness(position.P0);
            position.P1 = BinaryPrimitives.ReverseEndianness(position.P1);
            position.P2 = BinaryPrimitives.ReverseEndianness(position.P2);/* reverse the board */
            position.CastleFlags = (byte)((position.CastleFlags >> 4) | (position.CastleFlags << 4));/* roll the castle rights */
            position.STM ^= BLACK; /* change the side to move */
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenRook2(int sq, ulong occupation)
        {
            var lMask = LateralMasks[sq];
            ulong index = Bmi2.X64.ParallelBitExtract(occupation, lMask);
            return Bmi2.X64.ParallelBitDeposit(AttackMapLateral[(sq * (128 * 256)) + (int)index], lMask);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenBishop2(int sq, ulong occupation)
        {
            var dMask = DiagonalMasks[sq];
            ulong index = Bmi2.X64.ParallelBitExtract(occupation, dMask);
            return Bmi2.X64.ParallelBitDeposit(AttackMapDiag[(sq * (64 * 256)) + (int)index], dMask);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenRook(int sq, ulong occupation)
        {
            occupation ^= 1UL << sq; /* remove the selected piece from the occupation */

            return ((0x8080808080808080UL >> (int)(LSB((0x0101010101010101UL << sq) & (occupation | 0xFF00000000000000UL)) ^ 63)) & (0x0101010101010101UL << (int)MSB((0x8080808080808080UL >> (sq ^ 63)) & (occupation | 0x00000000000000FFUL)))) |
                   ((0xFF00000000000000UL >> (int)(LSB((0x00000000000000FFUL << sq) & (occupation | 0x8080808080808080UL)) ^ 63)) & (0x00000000000000FFUL << (int)MSB((0xFF00000000000000UL >> (sq ^ 63)) & (occupation | 0x0101010101010101UL))));
            /* From every direction find the first piece and from that piece put a mask in the opposite direction.
               Put togheter all the 4 masks and remove the moving piece */
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenBishop(int sq, ulong occupation)
        {
            /* it's the same as the rook */
            occupation ^= 1UL << sq;

            return ((0x8040201008040201UL >> (int)(LSB((0x8040201008040201UL << sq) & (occupation | 0xFF80808080808080UL)) ^ 63)) & (0x8040201008040201UL << (int)MSB((0x8040201008040201UL >> (sq ^ 63)) & (occupation | 0x01010101010101FFUL)))) |
                   ((0x8102040810204081UL >> (int)(LSB((0x8102040810204081UL << sq) & (occupation | 0xFF01010101010101UL)) ^ 63)) & (0x8102040810204081UL << (int)MSB((0x8102040810204081UL >> (sq ^ 63)) & (occupation | 0x80808080808080FFUL))));
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateKingMoves(ref TBoard position, ulong occupation, ulong opposing, ref int index, TMove[] moves)
        {
            ulong destinations;
            for (ulong pieces = Kings(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);

                ulong kingDest = KingDest[square];

                // generate captures
                for (destinations = opposing & kingDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.KING | TPieceType.CAPTURE | TPieceType.CHECKMOVEILLEGAL, From: square, To: (byte)LSB(destinations));
                }

                // generate moves
                for (destinations = ~occupation & kingDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.KING | TPieceType.CHECKMOVEILLEGAL, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateKnightMoves(ref TBoard position, ulong occupation, ulong opposing, ref int index, TMove[] moves, TPieceType inCheck, ulong pinnedSquares)
        {
            ulong destinations;
            for (ulong pieces = Knights(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType pinned = ((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

                ulong knightDest = KnightDest[square];

                // generate captures
                for (destinations = opposing & knightDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.KNIGHT | TPieceType.CAPTURE | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }

                // generate moves
                for (destinations = ~occupation & knightDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.KNIGHT | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateRookMoves(ref TBoard position, ulong occupation, ulong opposing, ref int index, TMove[] moves, TPieceType inCheck, ulong pinnedSquares)
        {
            ulong destinations;
            for (ulong pieces = Rooks(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType pinned = ((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

                ulong rookDest = GenRook(square, occupation);

                // generate captures
                for (destinations = opposing & rookDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.ROOK | TPieceType.CAPTURE | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }

                // generate moves
                for (destinations = ~occupation & rookDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.ROOK | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateBishopMoves(ref TBoard position, ulong occupation, ulong opposing, ref int index, TMove[] moves, TPieceType inCheck, ulong pinnedSquares)
        {
            ulong destinations;
            for (ulong pieces = Bishops(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType pinned = ((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

                ulong bishopDest = GenBishop(square, occupation);

                // generate captures
                for (destinations = opposing & bishopDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.BISHOP | TPieceType.CAPTURE | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }

                // generate moves
                for (destinations = ~occupation & bishopDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.BISHOP | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateQueenMoves(ref TBoard position, ulong occupation, ulong opposing, ref int index, TMove[] moves, TPieceType inCheck, ulong pinnedSquares)
        {
            ulong destinations;
            for (ulong pieces = Queens(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType pinned = ((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

                ulong queenDest = (GenRook(square, occupation) | GenBishop(square, occupation));

                // generate captures
                for (destinations = opposing & queenDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.QUEEN | TPieceType.CAPTURE | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }

                // generate moves
                for (destinations = ~occupation & queenDest; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.QUEEN | inCheck | pinned, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static bool Illegal(TMove move, ulong checks, int kingsq)
        {
            ref TBoard position = ref Position;
            ulong From = 1UL << move.From;
            ulong king = 1UL << move.To;
            ulong newoccupation = (Occupation(ref position) ^ From) | king;
            ulong newopposing = Bmi1.X64.AndNot(king, Opposing(ref position));

            if ((move.MoveType & TPieceType.PIECE_MASK) == TPieceType.KING)
            {
                kingsq = move.To;
                checks = (KnightDest[kingsq] & Knights(ref position)) | ((((king << 9) & 0xFEFEFEFEFEFEFEFEUL) | ((king << 7) & 0x7F7F7F7F7F7F7F7FUL))) & Pawns(ref position) | (KingDest[kingsq] & Kings(ref position));
            }
            else if ((move.MoveType & TPieceType.EP) > 0)
            {
                // Handle cases where taking EP exposes the king to check
                ulong epTo = (king >> 8);
                newoccupation ^= epTo;
                newopposing ^= epTo;
            }

            return (checks & newopposing) > 0 ||
                   (GenBishop(kingsq, newoccupation) & QueenOrBishops(ref position) & newopposing) > 0 ||
                   (GenRook(kingsq, newoccupation) & QueenOrRooks(ref position) & newopposing) > 0;
        }


        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        private static void GenerateMoves(TMove[] moves, ref int index, TPieceType isInCheck, ulong pinnedSquares)
        {
            ref TBoard position = ref Position;
            ulong occupation = Occupation(ref position);
            ulong opposing = Opposing(ref position);

            // generate moves from king to knight
            GenerateKingMoves(ref position, occupation, opposing, ref index, moves);
            GenerateQueenMoves(ref position, occupation, opposing, ref index, moves, isInCheck, pinnedSquares);
            GenerateRookMoves(ref position, occupation, opposing, ref index, moves, isInCheck, pinnedSquares);
            GenerateBishopMoves(ref position, occupation, opposing, ref index, moves, isInCheck, pinnedSquares);
            GenerateKnightMoves(ref position, occupation, opposing, ref index, moves, isInCheck, pinnedSquares);

            ulong pawns = Pawns(ref position) & SideToMove(ref position);
            GeneratePawnCaptures(moves, ref index, isInCheck, pinnedSquares, opposing, pawns);
            GeneratePawnPromotions(moves, ref index, isInCheck, pinnedSquares, occupation, opposing, pawns);
            GenerateEnPassant(moves, ref index, position, pawns);
            GeneratePawnPush(moves, ref index, isInCheck, pinnedSquares, position, occupation);
            GenerateCastling(moves, ref index, position, occupation, opposing);
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateEnPassant(TMove[] moves, ref int index, TBoard position, ulong pawns)
        {
            byte enPass;
            if ((enPass = EnPass(ref position)) != 8)
            {
                for (ulong enpassant = pawns & EnPassant[enPass]; enpassant != 0; enpassant = ClearLSB((enpassant)))
                    moves[index++].Move = NewMove(
                        MoveType: TPieceType.PAWN | TPieceType.EP | TPieceType.PROMO | TPieceType.CHECKMOVEILLEGAL,
                        From: (byte)LSB(enpassant),
                        To: (byte)(40 + enPass)
                    );
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateCastling(TMove[] moves, ref int index, TBoard position, ulong occupation, ulong opposing)
        {
            /* check if long castling is possible */
            if (CanCastleLM()
            && (occupation & 0x0EUL) == 0
            && (((((ExtractLSB(0x1010101010101000UL & occupation) /* column e */
                 | ExtractLSB(0x0808080808080800UL & occupation) /*column d */
                 | ExtractLSB(0x0404040404040400UL & occupation) /*column c */
                 | ExtractLSB(0x00000000000000E0UL & occupation)  /* row 1 */) & QueenOrRooks(ref position))
                 | ((ExtractLSB(0x0000000102040800UL & occupation) /*antidiag from e1/e8 */
                 | ExtractLSB(0x0000000001020400UL & occupation) /*antidiag from d1/d8 */
                 | ExtractLSB(0x0000000000010200UL & occupation) /*antidiag from c1/c8 */
                 | ExtractLSB(0x0000000080402000UL & occupation) /*diag from e1/e8 */
                 | ExtractLSB(0x0000008040201000UL & occupation) /*diag from d1/d8 */
                 | ExtractLSB(0x0000804020100800UL & occupation) /*diag from c1/c8 */) & QueenOrBishops(ref position))
                 | (0x00000000003E7700UL & Knights(ref position))
                 | (0x0000000000003E00UL & Pawns(ref position))
                 | (Kings(ref position) & 0x0000000000000600UL)) & opposing) == 0))
                /* check if c1/c8 d1/d8 e1/e8 are not attacked */
                moves[index++].Move = NewMove
            (
                MoveType: TPieceType.KING | TPieceType.CASTLE,
                From: 4,
                To: 2
            );

            /* check if short castling is possible */
            if (CanCastleSM()
            && (occupation & 0x60UL) == 0
            && (((((ExtractLSB(0x1010101010101000UL & occupation) /* column e */
                 | ExtractLSB(0x2020202020202000UL & occupation) /* column f */
                 | ExtractLSB(0x4040404040404000UL & occupation) /* column g */
                 | 1UL << (int)MSB(0x000000000000000FUL & (occupation | 0x1UL))/* row 1 */) & QueenOrRooks(ref position))
                 | ((ExtractLSB(0x0000000102040800UL & occupation) /* antidiag from e1/e8 */
                 | ExtractLSB(0x0000010204081000UL & occupation) /*antidiag from f1/f8 */
                 | ExtractLSB(0x0001020408102000UL & occupation) /*antidiag from g1/g8 */
                 | ExtractLSB(0x0000000080402000UL & occupation) /*diag from e1/e8 */
                 | ExtractLSB(0x0000000000804000UL & occupation) /*diag from f1/f8 */
                 | 0x0000000000008000UL /*diag from g1/g8 */) & QueenOrBishops(ref position))
                 | (0x0000000000F8DC00UL & Knights(ref position))
                 | (0x000000000000F800UL & Pawns(ref position))
                 | (Kings(ref position) & 0x0000000000004000UL)) & opposing) == 0))
                /* check if e1/e8 f1/f8 g1/g8 are not attacked */
                moves[index++].Move = NewMove
                    (
                        MoveType: TPieceType.KING | TPieceType.CASTLE,
                        From: 4,
                        To: 6
                    );
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GeneratePawnPush(TMove[] moves, ref int index, TPieceType isInCheck, ulong kingRay, TBoard position, ulong occupation)
        {
            ulong push1 = (((Pawns(ref position) & SideToMove(ref position)) << 8) & ~occupation) & 0x00FFFFFFFFFFFFFFUL;
            ulong push2 = (push1 << 8) & ~occupation & 0x00000000FF000000UL;
            byte square;
            /* one pawns push */
            for (; push1 != 0; push1 = ClearLSB(push1))
            {
                square = (byte)LSB(push1);
                moves[index++].Move = NewMove
                (
                    MoveType: TPieceType.PAWN | isInCheck | ((kingRay & (1UL << (square - 8))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY),
                    From: (byte)(square - 8),
                    To: square
                );
            }

            /* double pawns pushes */
            for (; push2 != 0; push2 = ClearLSB(push2))
            {
                square = (byte)LSB(push2);
                moves[index++].Move = NewMove
                (
                    MoveType: TPieceType.PAWN | isInCheck | ((kingRay & (1UL << (square - 16))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY),
                    From: (byte)(square - 16),
                    To: square
                );
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GeneratePawnCaptures(TMove[] moves, ref int index, TPieceType isInCheck, ulong kingRay, ulong opposing, ulong pawns)
        {
            byte square;
            TPieceType pinned;
            ulong lsb;
            /* Generate pawns right captures */
            for (lsb = (pawns << 9) & 0x00FEFEFEFEFEFEFEUL & opposing; lsb != 0; lsb = ClearLSB(lsb))
            {
                square = (byte)LSB(lsb);
                pinned = (kingRay & (1UL << (square - 9))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                moves[index++].Move = NewMove(
                    MoveType: TPieceType.PAWN | TPieceType.CAPTURE | isInCheck | pinned,
                    From: (byte)(square - 9),
                    To: square
                );
            }

            /* Generate pawns left captures */
            for (lsb = (pawns << 7) & 0x007F7F7F7F7F7F7FUL & opposing; lsb != 0; lsb = ClearLSB(lsb))
            {
                square = (byte)LSB(lsb);
                pinned = (kingRay & (1UL << (square - 7))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                moves[index++].Move = NewMove(
                    MoveType: TPieceType.PAWN | TPieceType.CAPTURE | isInCheck | pinned,
                    From: (byte)(square - 7),
                    To: (byte)square
                );
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GeneratePawnPromotions(TMove[] moves, ref int index, TPieceType isInCheck, ulong kingRay, ulong occupation, ulong opposing, ulong pawns)
        {
            byte square;
            TPieceType pinned;
            TPieceType piece;
            ulong lsb;
            /* Generate pawns promotions */
            if ((pawns & 0x00FF000000000000UL) != 0)
            {
                /* promotions with left capture */
                for (lsb = (pawns << 9) & 0xFE00000000000000UL & opposing; lsb != 0; lsb = ClearLSB(lsb))
                {
                    square = (byte)LSB(lsb);
                    pinned = (kingRay & (1UL << (square - 9))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                    for (piece = TPieceType.QUEEN; piece >= TPieceType.KNIGHT; piece--)
                    {
                        moves[index++].Move = NewMovePromo(
                            MoveType: TPieceType.PAWN | TPieceType.PROMO | TPieceType.CAPTURE | isInCheck | pinned,
                            From: (byte)(square - 9),
                            To: (byte)square,
                            Promotion: piece
                        );
                    }
                }

                /* promotions with right capture */
                for (lsb = (pawns << 7) & 0x7F00000000000000UL & opposing; lsb != 0; lsb = ClearLSB(lsb))
                {
                    square = (byte)LSB(lsb);
                    pinned = (kingRay & (1UL << (square - 7))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                    for (piece = TPieceType.QUEEN; piece >= TPieceType.KNIGHT; piece--)
                        moves[index++].Move = NewMovePromo(
                            MoveType: TPieceType.PAWN | TPieceType.PROMO | TPieceType.CAPTURE | isInCheck | pinned,
                            From: (byte)(square - 7),
                            To: (byte)square,
                            Promotion: piece
                        );
                }
                /* no capture promotions */
                for (lsb = ((pawns << 8) & ~occupation) & 0xFF00000000000000UL; lsb != 0; lsb = ClearLSB(lsb))
                {
                    square = (byte)LSB(lsb);
                    pinned = (kingRay & (1UL << (square - 8))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                    for (piece = TPieceType.QUEEN; piece >= TPieceType.KNIGHT; piece--)
                        moves[index++].Move = NewMovePromo(
                            MoveType: TPieceType.PAWN | TPieceType.PROMO | isInCheck | pinned,
                            From: (byte)(square - 8),
                            To: (byte)square,
                            Promotion: piece
                        );
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void Make(TMove move)
        {
            ref TBoard position = ref Position;
            Game[iPosition++] = position;
            ulong part = 1UL << move.From;
            ulong dest = 1UL << move.To;
            position.EnPassant = 8;

            if ((move.MoveType & TPieceType.CAPTURE) != 0)
            {
                /* Delete the captured piece */
                position.P0 &= ~dest;
                position.P1 &= ~dest;
                position.P2 &= ~dest;

                if (move.To == 63) ResetCastleSO(); /* captured the opponent king side rook */
                else if (move.To == 56) ResetCastleLO(); /* captured the opponent quuen side rook */
            }

            switch (move.MoveType & TPieceType.PIECE_MASK)
            {
                case TPieceType.PAWN:
                    if ((move.MoveType & (TPieceType.EP | TPieceType.PROMO)) != 0)
                    {
                        /* EnPassant */
                        if ((move.MoveType & TPieceType.EP) != 0)
                        {
                            position.PM ^= part | dest;
                            position.P0 ^= part | dest;
                            position.P0 ^= dest >> 8; //delete the captured pawn
                        }
                        else
                        {
                            position.PM ^= part | dest;
                            position.P0 ^= part;
                            position.P0 |= (ulong)((int)move.Promotion & 1) << move.To;
                            position.P1 |= (ulong)(((int)move.Promotion >> 1) & 1) << move.To;
                            position.P2 |= (ulong)((int)move.Promotion >> 2) << move.To;
                        }
                    }
                    else /* capture or push */
                    {
                        position.PM ^= part | dest;
                        position.P0 ^= part | dest;

                        if (move.To == move.From + 16 && (EnPassantM[move.To & 0x07] & Pawns(ref position) & Opposing(ref position)) != 0)
                            position.EnPassant = (byte)(move.To & 0x07); /* save enpassant column */
                    }

                    ChangeSide();
                    return;
                case TPieceType.KNIGHT:
                    position.PM ^= part | dest;
                    position.P1 ^= part | dest;
                    ChangeSide();
                    return;
                case TPieceType.BISHOP:
                    position.PM ^= part | dest;
                    position.P0 ^= part | dest;
                    position.P1 ^= part | dest;
                    ChangeSide();
                    return;
                case TPieceType.ROOK:
                    position.PM ^= part | dest;
                    position.P2 ^= part | dest;
                    if (move.From == 7)
                        ResetCastleSM(); //king side rook moved
                    else if (move.From == 0)
                        ResetCastleLM(); // queen side rook moved
                    ChangeSide();
                    return;
                case TPieceType.QUEEN:
                    position.PM ^= part | dest;
                    position.P0 ^= part | dest;
                    position.P2 ^= part | dest;

                    ChangeSide();
                    return;
                case TPieceType.KING:
                    position.PM ^= part | dest;
                    position.P1 ^= part | dest;
                    position.P2 ^= part | dest;
                    ResetCastleSM(); /* update the castle rights */
                    ResetCastleLM();

                    if ((move.MoveType & TPieceType.CASTLE) != 0)
                    {
                        if (move.To == 6)
                        {
                            position.PM ^= 0x00000000000000A0UL;
                            position.P2 ^= 0x00000000000000A0UL;
                        } /* short castling */
                        else
                        {
                            position.PM ^= 0x0000000000000009UL;
                            position.P2 ^= 0x0000000000000009UL;
                        } /* long castling */
                    }
                    ChangeSide();
                    return;
            }
        }

        private static void LoadPosition(string fen)
        {
            /* Clear the board */
            ref TBoard pos = ref Position;
            pos.P0 = pos.P1 = pos.P2 = pos.PM = 0;
            pos.EnPassant = 8;
            pos.STM = WHITE;
            pos.CastleFlags = 0;

            /* translate the fen to the relative position */
            byte pieceside = WHITE;
            ulong piece = (ulong)TPieceType.PAWN;
            byte sidetomove = WHITE;
            int square = 0;
            int cursor;
            for (cursor = 0; fen[cursor] != ' '; cursor++)
            {
                char cur = fen[cursor];
                if (cur >= '1' && cur <= '8')
                    square += cur - '0';
                else if (cur != '/')
                {
                    int bit = OppSq(square);
                    if (cur == 'p') { piece = (ulong)TPieceType.PAWN; pieceside = BLACK; }
                    else if (cur == 'n') { piece = (ulong)TPieceType.KNIGHT; pieceside = BLACK; }
                    else if (cur == 'b') { piece = (ulong)TPieceType.BISHOP; pieceside = BLACK; }
                    else if (cur == 'r') { piece = (ulong)TPieceType.ROOK; pieceside = BLACK; }
                    else if (cur == 'q') { piece = (ulong)TPieceType.QUEEN; pieceside = BLACK; }
                    else if (cur == 'k') { piece = (ulong)TPieceType.KING; pieceside = BLACK; }
                    else if (cur == 'P') { piece = (ulong)TPieceType.PAWN; pieceside = WHITE; }
                    else if (cur == 'N') { piece = (ulong)TPieceType.KNIGHT; pieceside = WHITE; }
                    else if (cur == 'B') { piece = (ulong)TPieceType.BISHOP; pieceside = WHITE; }
                    else if (cur == 'R') { piece = (ulong)TPieceType.ROOK; pieceside = WHITE; }
                    else if (cur == 'Q') { piece = (ulong)TPieceType.QUEEN; pieceside = WHITE; }
                    else if (cur == 'K') { piece = (ulong)TPieceType.KING; pieceside = WHITE; }
                    pos.P0 |= (piece & 1) << bit; //001
                    pos.P1 |= ((piece >> 1) & 1) << bit; //010
                    pos.P2 |= (piece >> 2) << bit; //100
                    if (pieceside == WHITE)
                    {
                        pos.PM |= (1UL << bit);
                        piece |= BLACK;
                    }
                    square++;
                }
            }

            cursor++; /* read the side to move  */
            if (fen[cursor] == 'w')
                sidetomove = WHITE;
            else if (fen[cursor] == 'b')
                sidetomove = BLACK;
            cursor += 2;
            if (fen[cursor] != '-') /* read the castle rights */
            {
                for (; fen[cursor] != ' '; cursor++)
                {
                    char cur = fen[cursor];
                    if (cur == 'K') pos.CastleFlags |= 0x02;
                    else if (cur == 'Q') pos.CastleFlags |= 0x01;
                    else if (cur == 'k') pos.CastleFlags |= 0x20;
                    else if (cur == 'q') pos.CastleFlags |= 0x10;
                }
                cursor++;
            }
            else cursor += 2;
            if (fen[cursor] != '-') /* read the enpassant column */
            {
                pos.EnPassant = (byte)(fen[cursor] - 'a');
                //cursor++;
            }
            if (sidetomove == BLACK)
                ChangeSide();
        }

        private static long Perft(int depth)
        {
            long total = 0;
            int index = 0;
            var moveList = MovesLists[depth];
            ref TBoard position = ref Position;
            ulong king = Kings(ref position) & SideToMove(ref position);
            byte kingsq = (byte)LSB(king);
            ulong occupation = Occupation(ref position);
            ulong opposing = Opposing(ref position);
            ulong kingRayDiag = GenBishop(kingsq, occupation);
            ulong kingRayLat = GenRook(kingsq, occupation);

            ulong checks = (KnightDest[kingsq] & Knights(ref position)) | ((((king << 9) & 0xFEFEFEFEFEFEFEFEUL) | ((king << 7) & 0x7F7F7F7F7F7F7F7FUL)) & Pawns(ref position)) | (KingDest[kingsq] & Kings(ref position));

            TPieceType inCheck = (kingRayDiag & QueenOrBishops(ref position) & opposing) != 0 ||
                (kingRayLat & QueenOrRooks(ref position) & opposing) != 0 ||
                (checks & opposing) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

            ulong opposingDiag = QueenOrBishops(ref position) & opposing;
            ulong opposingLat = QueenOrRooks(ref position) & opposing;

            ulong kingRayD = 0;
            ulong kingRayA = 0;
            ulong kingRayF = 0;
            ulong kingRayR = 0;

            if ((DiagonalMasksA[kingsq] & opposingDiag) != 0)
            {
                kingRayA = DiagonalMasksA[kingsq] & kingRayDiag;
            }

            if ((DiagonalMasksD[kingsq] & opposingDiag) != 0)
            {
                kingRayD = DiagonalMasksD[kingsq] & kingRayDiag;
            }

            if ((LateralMasksF[kingsq] & opposingLat) != 0)
            {
                kingRayF = LateralMasksF[kingsq] & kingRayLat;
            }

            if ((LateralMasksR[kingsq] & opposingLat) != 0)
            {
                kingRayR = LateralMasksR[kingsq] & kingRayLat;
            }

            ulong pinnedSquares = (kingRayD | kingRayA | kingRayF | kingRayR);

            GenerateMoves(moveList, ref index, inCheck, pinnedSquares);

            for (int i = 0; i < index; i++)
            {
                if ((moveList[i].MoveType & TPieceType.CHECKMOVEILLEGAL) != 0 && Illegal(moveList[i], checks, kingsq))
                    continue;
                if (depth > 1)
                {
                    Make(moveList[i]);

                    total += Perft(depth - 1);
                    Position = Game[--iPosition];
                }
                else
                    total++;
            }

            return total;
        }

        struct PerftResult
        {
            public double Duration;
            public long Nodes;

            public PerftResult(double t, long n)
            {
                Duration = t;
                Nodes = n;
            }

            public static PerftResult operator +(PerftResult a, PerftResult b) => new(a.Duration + b.Duration, a.Nodes + b.Nodes);
        }

        private static PerftResult TestPerft(string fen, int depth, int expectedResult)
        {
            LoadPosition(fen);

            long t0 = Stopwatch.GetTimestamp();
            long count = Perft(depth);
            long t1 = Stopwatch.GetTimestamp();
            double dt = (t1 - t0) / (double)Stopwatch.Frequency;
            double ms = (1000 * dt);
            if (expectedResult != count)
            {
                Console.WriteLine($"ERROR in Perft({fen}, {depth})");
                Console.WriteLine($"Computed result: {count}");
                Console.WriteLine($"Expected result: {expectedResult}");
            }
            else
                Console.WriteLine($"OK! {(int)ms}ms, {(int)(count / ms)}K NPS");
            return new PerftResult(dt, count);
        }

        public static void Main()
        {
            Console.WriteLine("QBB Perft in C#");
            Console.WriteLine("https://github.com/lithander/QBB-Perft/tree/v1.4");
            Console.WriteLine();

            PerftResult totalTime = new PerftResult(0, 0);
            totalTime += TestPerft("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 6, 119060324); //Start Position
            totalTime += TestPerft("r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 5, 193690690);
            totalTime += TestPerft("8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 1", 7, 178633661);
            totalTime += TestPerft("r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1", 6, 706045033);
            totalTime += TestPerft("rnbqkb1r/pp1p1ppp/2p5/4P3/2B5/8/PPP1NnPP/RNBQK2R w KQkq - 0 6", 3, 53392);
            totalTime += TestPerft("r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10", 5, 164075551);
            Console.WriteLine();
            Console.WriteLine($"Total: {totalTime.Nodes} Nodes, {(int)(1000 * totalTime.Duration)}ms, {(int)(totalTime.Nodes / totalTime.Duration / 1000)}K NPS");
            Console.WriteLine("Press any key to quit");//stop command prompt from closing automatically on windows
            //Console.ReadKey();
        }

        //**** Debug Helpers not in the original C code

        private static void PrintPosition(TBoard pos)
        {
            PrintBB(pos.PM, "PM");
            PrintBB(pos.P0, "P0");
            PrintBB(pos.P1, "P1");
            PrintBB(pos.P2, "P2");
            Console.WriteLine("- - - -");
            PrintBB(Pawns(ref pos), "Pawns");
            PrintBB(Knights(ref pos), "Knights");
            PrintBB(Bishops(ref pos), "Bishops");
            PrintBB(Rooks(ref pos), "Roosk");
            PrintBB(Queens(ref pos), "Queens");
            PrintBB(Kings(ref pos), "Kings");

            Console.WriteLine($"CastleFlags: {pos.CastleFlags}");  /* ..sl..SL  short long opponent SHORT LONG side to move */
            Console.WriteLine($"EnPassant column: {pos.EnPassant} (8 if not set)");
            Console.WriteLine($"SideToMove: {pos.STM}"); /* side to move */
            Console.WriteLine();
        }

        static void PrintBB(ulong bb, string label)
        {
            if (label != null)
                Console.WriteLine(label);
            Console.WriteLine(Convert.ToString((long)bb, 16).PadLeft(16, '0'));
            Console.WriteLine("----------------");
            byte[] bbBytes = BitConverter.GetBytes(bb);
            Array.Reverse(bbBytes);
            foreach (byte bbByte in bbBytes)
            {
                string line = Convert.ToString(bbByte, 2).PadLeft(8, '0');
                line = line.Replace('1', 'X');
                line = line.Replace('0', '.');
                var chars = line.ToCharArray();
                Array.Reverse(chars);
                Console.WriteLine(string.Join(' ', chars));
            }
            Console.WriteLine();
        }

        private static long Divide(int depth)
        {
            long total = 0;
            int index = 0;
            var moveList = MovesLists[depth];
            ref TBoard position = ref Position;
            ulong king = Kings(ref position) & SideToMove(ref position);
            byte kingsq = (byte)LSB(king);
            ulong occupation = Occupation(ref position);
            ulong opposing = Opposing(ref position);
            ulong kingRayDiag = GenBishop(kingsq, occupation);
            ulong kingRayLat = GenRook(kingsq, occupation);

            ulong checks = (KnightDest[kingsq] & Knights(ref position)) | ((((king << 9) & 0xFEFEFEFEFEFEFEFEUL) | ((king << 7) & 0x7F7F7F7F7F7F7F7FUL)) & Pawns(ref position)) | (KingDest[kingsq] & Kings(ref position));

            TPieceType inCheck = (kingRayDiag & QueenOrBishops(ref position) & opposing) != 0 ||
                (kingRayLat & QueenOrRooks(ref position) & opposing) != 0 ||
                (checks & opposing) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

            ulong opposingDiag = QueenOrBishops(ref position) & opposing;
            ulong opposingLat = QueenOrRooks(ref position) & opposing;

            ulong kingRayD = 0;
            ulong kingRayA = 0;
            ulong kingRayF = 0;
            ulong kingRayR = 0;

            if ((DiagonalMasksA[kingsq] & opposingDiag) != 0)
            {
                kingRayA = DiagonalMasksA[kingsq] & kingRayDiag;
            }

            if ((DiagonalMasksD[kingsq] & opposingDiag) != 0)
            {
                kingRayD = DiagonalMasksD[kingsq] & kingRayDiag;
            }

            if ((LateralMasksF[kingsq] & opposingLat) != 0)
            {
                kingRayF = LateralMasksF[kingsq] & kingRayLat;
            }

            if ((LateralMasksR[kingsq] & opposingLat) != 0)
            {
                kingRayR = LateralMasksR[kingsq] & kingRayLat;
            }

            ulong pinnedSquares = (kingRayD | kingRayA | kingRayF | kingRayR);

            GenerateMoves(moveList, ref index, inCheck, pinnedSquares);

            for (int i = 0; i < index; i++)
            {
                if ((moveList[i].MoveType & TPieceType.CHECKMOVEILLEGAL) != 0 && Illegal(moveList[i], checks, kingsq))
                    continue;

                long nodes = 1;
                if (depth > 1)
                {
                    Make(moveList[i]);

                    nodes = Perft(depth - 1);
                    Position = Game[--iPosition];
                }

                total += nodes;

                Console.WriteLine($"  {MoveToStr(moveList[i], Position.STM)}:    {nodes:N0}");
            }

            return total;
        }

    }
}
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Why C++ instead of C#?

Post by mvanthoor »

lithander wrote: Wed Sep 29, 2021 2:12 pm ...
Today I forked your repository and if I find some time somewhere I'll try and port the C version as it now stands to Rust Even though I think that there will be very little difference between a Rust-compiled version and a Clang-compiled version... it may be useful for completeness' sake. The Rust version would be here.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Why C++ instead of C#?

Post by mvanthoor »

lithander wrote: Wed Sep 29, 2021 2:24 pm This is just another example of an expression that the C compiler had already optimized but the C# compiler didn't.
I've tried many... _MANY_ optimizations I found across the internet to make my engine faster. "Use powers of 2 for the number of TT-entries, so you can use a bitmask, which is faster than modulo", for example. Tried it. Failed. Then I found a post on Talkchess that modulo can be replaced by bitshifting, and this works for any number of TT entries. I would not be surprised if LLVM (rustc's compiler backend) knows this trick, and already applies it in place of the modulo operator.

This way I've encountered many different optimizations that in the end didn't give me any improvements. The only things that really gave me an improvement is removing vectors that live on the heap and replace them by a struct that keeps an array and a counter, on the stack. I now only use vectors in the places where 1) speed doesn't matter (parsing incoming FEN-strings for example) or 2) I need the variable size (appending child-pv to parent-pv). I even tried to optimize 2) by initializing the vector with a certain capacity, and I tried a huge array which I then handled as if it was a vector, but it didn't bring any improvement because the PV (apparently) doesn't change enough to make an impact.

So I basically stopped trying to hand-optimize my code. I just take care to not allocate and de-allocate memory in hotspot functions, and up until now, I'm fine.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
spirch
Posts: 95
Joined: Fri Nov 09, 2012 12:36 am

Re: Why C++ instead of C#?

Post by spirch »

this one i made a kind of a big change, maybe it will be rejected

i removed the capture flag and refactoring nearly every move method

i moved the capture detection in the make method

this save a few hundred millisecond

Code: Select all

/*
 This perft implementation is based on QBBEngine by Fabio Gobbato and ported to C# by Thomas Jahn
 
 The purpose is to compare the speed differences of C# and C in chess-programming workload.
*/

using System;
using System.Buffers.Binary;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics.X86;
using System.Text;

namespace QBBLatest
{
    public class QbbPerft
    {
        const int MAX_PLY = 32;
        const int WHITE = 0;
        const int BLACK = 8;

        /* define the move type, for example
           KING|CASTLE is a castle move
           PAWN|CAPTURE|EP is an enpassant move
           PAWN|PROMO|CAPTURE is a promotion with a capture */

        /* define the piece type: empty, pawn, knight, bishop, rook, queen, king */
        [Flags]
        public enum TPieceType : byte { EMPTY, PAWN, KNIGHT, BISHOP, ROOK, QUEEN, KING, PIECE_MASK = 0x07, CASTLE = 0x40, PROMO = 0x20, EP = 0x10, /*CAPTURE = 0x08,*/ CHECKMOVEILLEGAL = 0x80 }

        /* move structure */
        [StructLayout(LayoutKind.Explicit)]
        public struct TMove
        {
            [FieldOffset(0)]
            public int Move;
            [FieldOffset(0)]
            public TPieceType MoveType;
            [FieldOffset(1)]
            public byte From;
            [FieldOffset(2)]
            public byte To;
            [FieldOffset(3)]
            public TPieceType Promotion;
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int NewMove(TPieceType MoveType, byte From, byte To) => (byte)MoveType | (From << 8) | (To << 16);

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int NewMovePromo(TPieceType MoveType, byte From, byte To, TPieceType Promotion) => (byte)MoveType | (From << 8) | (To << 16) | ((byte)Promotion << 24);

        static TMove[][] MovesLists;

        static QbbPerft()
        {
            MovesLists = new TMove[MAX_PLY][];
            for (int i = 0; i < MAX_PLY; i++)
            {
                MovesLists[i] = new TMove[225];
            }

            PreComputePEXTAttacks();
        }

        /*
        Board structure definition
        PM,P0,P1,P2 are the 4 bitboards that contain the whole board
        PM is the bitboard with the side to move pieces
        P0,P1 and P2: with these bitboards you can obtain every type of pieces and every pieces combinations.
        */
        public struct TBoard
        {
            public ulong PM;
            public ulong P0;
            public ulong P1;
            public ulong P2;
            public byte CastleFlags; /* ..sl..SL  short long opponent SHORT LONG side to move */
            public byte EnPassant; /* enpassant column, =8 if not set */
            public byte STM; /* side to move */
        }

        static TBoard[] Game = new TBoard[MAX_PLY];
        static int iPosition;
        private static TBoard Position;

        public static ulong[] LateralMasks;
        public static ulong[] LateralMasksF;
        public static ulong[] LateralMasksR;
        public static ulong[] DiagonalMasks;
        public static ulong[] DiagonalMasksD;
        public static ulong[] DiagonalMasksA;

        public static ushort[] AttackMapDiag;
        public static ushort[] AttackMapLateral;

        public static readonly ulong[] RankMaskBase = { 0x00000000000000FF, 0x000000000000FF00, 0x0000000000FF0000, 0x00000000FF000000, 0x000000FF00000000, 0x0000FF0000000000, 0x00FF000000000000, 0xFF00000000000000 };
        public static readonly ulong[] FileMaskBase = { 0x0101010101010101, 0x0202020202020202, 0x0404040404040404, 0x0808080808080808, 0x1010101010101010, 0x2020202020202020, 0x4040404040404040, 0x8080808080808080 };
        public static readonly ulong[] DiagonalMaskBase = { 0x0000000000000001, 0x0000000000000102, 0x0000000000010204, 0x0000000001020408, 0x0000000102040810, 0x0000010204081020, 0x0001020408102040, 0x0102040810204080, 0x0204081020408000, 0x0408102040800000, 0x0810204080000000, 0x1020408000000000, 0x2040800000000000, 0x4080000000000000, 0x8000000000000000 };
        public static readonly ulong[] AntiDiagonalMaskBase = { 0x0000000000000080, 0x0000000000008040, 0x0000000000804020, 0x0000000080402010, 0x0000008040201008, 0x0000804020100804, 0x0080402010080402, 0x8040201008040201, 0x4020100804020100, 0x2010080402010000, 0x1008040201000000, 0x0804020100000000, 0x0402010000000000, 0x0201000000000000, 0x0100000000000000 };

        public static void PreComputePEXTAttacks()
        {
            LateralMasks = new ulong[65];
            LateralMasksF = new ulong[65];
            LateralMasksR = new ulong[65];
            DiagonalMasks = new ulong[65];
            DiagonalMasksD = new ulong[65];
            DiagonalMasksA = new ulong[65];
            AttackMapDiag = new ushort[64 * 64 * 256];
            AttackMapLateral = new ushort[64 * 128 * 256];

            for (int sq = 0; sq < 64; sq++)
            {
                int diagOffset = (sq / 8) + (sq % 8);
                int antiDiagOffset = (sq / 8) + 7 - (sq % 8);
                LateralMasksF[sq] = FileMaskBase[sq % 8];
                LateralMasksR[sq] = RankMaskBase[sq / 8];
                LateralMasks[sq] = FileMaskBase[sq % 8] | RankMaskBase[sq / 8];
                DiagonalMasksD[sq] = DiagonalMaskBase[diagOffset];
                DiagonalMasksA[sq] = AntiDiagonalMaskBase[antiDiagOffset];
                DiagonalMasks[sq] = DiagonalMaskBase[diagOffset] | AntiDiagonalMaskBase[antiDiagOffset];

                for (int occupiedOne = 0; occupiedOne < 256; occupiedOne++)
                {
                    for (int occupiedTwo = 0; occupiedTwo < 256; occupiedTwo++)
                    {
                        var dMask = DiagonalMasks[sq];
                        var lMask = LateralMasks[sq];

                        ulong aPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedOne, AntiDiagonalMaskBase[antiDiagOffset]) | (1UL << sq);
                        ulong dPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedTwo, DiagonalMaskBase[diagOffset]) | (1UL << sq);
                        ulong rPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedOne, RankMaskBase[sq / 8]) | (1UL << sq);
                        ulong fPosition = Bmi2.X64.ParallelBitDeposit((ulong)occupiedTwo, FileMaskBase[sq % 8]) | (1UL << sq);

                        ulong index = Bmi2.X64.ParallelBitExtract((aPosition | dPosition), dMask);
                        ulong index2 = Bmi2.X64.ParallelBitExtract((rPosition | fPosition), lMask);

                        ulong dVal = Bmi2.X64.ParallelBitExtract(GenBishop(sq, aPosition | dPosition), dMask);
                        ulong lVal = Bmi2.X64.ParallelBitExtract(GenRook(sq, rPosition | fPosition), lMask);

                        AttackMapDiag[sq * (64 * 256) + (int)index] = (ushort)dVal;
                        AttackMapLateral[sq * (128 * 256) + (int)index2] = (ushort)lVal;
                    }
                }
            }
        }

        /* array of bitboards that contains all the knight destination for every square */
        public static readonly ulong[] KnightDest = {
            0x0000000000020400UL,0x0000000000050800UL,0x00000000000a1100UL,0x0000000000142200UL,
            0x0000000000284400UL,0x0000000000508800UL,0x0000000000a01000UL,0x0000000000402000UL,
            0x0000000002040004UL,0x0000000005080008UL,0x000000000a110011UL,0x0000000014220022UL,
            0x0000000028440044UL,0x0000000050880088UL,0x00000000a0100010UL,0x0000000040200020UL,
            0x0000000204000402UL,0x0000000508000805UL,0x0000000a1100110aUL,0x0000001422002214UL,
            0x0000002844004428UL,0x0000005088008850UL,0x000000a0100010a0UL,0x0000004020002040UL,
            0x0000020400040200UL,0x0000050800080500UL,0x00000a1100110a00UL,0x0000142200221400UL,
            0x0000284400442800UL,0x0000508800885000UL,0x0000a0100010a000UL,0x0000402000204000UL,
            0x0002040004020000UL,0x0005080008050000UL,0x000a1100110a0000UL,0x0014220022140000UL,
            0x0028440044280000UL,0x0050880088500000UL,0x00a0100010a00000UL,0x0040200020400000UL,
            0x0204000402000000UL,0x0508000805000000UL,0x0a1100110a000000UL,0x1422002214000000UL,
            0x2844004428000000UL,0x5088008850000000UL,0xa0100010a0000000UL,0x4020002040000000UL,
            0x0400040200000000UL,0x0800080500000000UL,0x1100110a00000000UL,0x2200221400000000UL,
            0x4400442800000000UL,0x8800885000000000UL,0x100010a000000000UL,0x2000204000000000UL,
            0x0004020000000000UL,0x0008050000000000UL,0x00110a0000000000UL,0x0022140000000000UL,
            0x0044280000000000UL,0x0088500000000000UL,0x0010a00000000000UL,0x0020400000000000UL,
        };

        /* The same for the king */
        public static readonly ulong[] KingDest = {
            0x0000000000000302UL,0x0000000000000705UL,0x0000000000000e0aUL,0x0000000000001c14UL,
            0x0000000000003828UL,0x0000000000007050UL,0x000000000000e0a0UL,0x000000000000c040UL,
            0x0000000000030203UL,0x0000000000070507UL,0x00000000000e0a0eUL,0x00000000001c141cUL,
            0x0000000000382838UL,0x0000000000705070UL,0x0000000000e0a0e0UL,0x0000000000c040c0UL,
            0x0000000003020300UL,0x0000000007050700UL,0x000000000e0a0e00UL,0x000000001c141c00UL,
            0x0000000038283800UL,0x0000000070507000UL,0x00000000e0a0e000UL,0x00000000c040c000UL,
            0x0000000302030000UL,0x0000000705070000UL,0x0000000e0a0e0000UL,0x0000001c141c0000UL,
            0x0000003828380000UL,0x0000007050700000UL,0x000000e0a0e00000UL,0x000000c040c00000UL,
            0x0000030203000000UL,0x0000070507000000UL,0x00000e0a0e000000UL,0x00001c141c000000UL,
            0x0000382838000000UL,0x0000705070000000UL,0x0000e0a0e0000000UL,0x0000c040c0000000UL,
            0x0003020300000000UL,0x0007050700000000UL,0x000e0a0e00000000UL,0x001c141c00000000UL,
            0x0038283800000000UL,0x0070507000000000UL,0x00e0a0e000000000UL,0x00c040c000000000UL,
            0x0302030000000000UL,0x0705070000000000UL,0x0e0a0e0000000000UL,0x1c141c0000000000UL,
            0x3828380000000000UL,0x7050700000000000UL,0xe0a0e00000000000UL,0xc040c00000000000UL,
            0x0203000000000000UL,0x0507000000000000UL,0x0a0e000000000000UL,0x141c000000000000UL,
            0x2838000000000000UL,0x5070000000000000UL,0xa0e0000000000000UL,0x40c0000000000000UL
        };

        /* masks for finding the pawns that can capture with an enpassant (in move generation) */
        static readonly ulong[] EnPassant = {
            0x0000000200000000UL,0x0000000500000000UL,0x0000000A00000000UL,0x0000001400000000UL,
            0x0000002800000000UL,0x0000005000000000UL,0x000000A000000000UL,0x0000004000000000UL
        };

        /* masks for finding the pawns that can capture with an enpassant (in make move) */
        static readonly ulong[] EnPassantM = {
            0x0000000002000000UL,0x0000000005000000UL,0x000000000A000000UL,0x0000000014000000UL,
            0x0000000028000000UL,0x0000000050000000UL,0x00000000A0000000UL,0x0000000040000000UL
        };

        /*
        reverse a bitboard:
        A bitboard is an array of byte: Byte0,Byte1,Byte2,Byte3,Byte4,Byte5,Byte6,Byte7
        after this function the bitboard will be: Byte7,Byte6,Byte5,Byte4,Byte3,Byte2,Byte1,Byte0
        The board is saved always with the side to move in the low significant bits of the bitboard, so this function
        is used to change the side to move
        */
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        public static ulong RevBB(ulong bb) => BinaryPrimitives.ReverseEndianness(bb);

        /* return the index of the most significant bit of the bitboard, bb must always be !=0 */
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        public static ulong MSB(ulong bb) => 63 ^ Lzcnt.X64.LeadingZeroCount(bb);

        /* return the index of the least significant bit of the bitboard, bb must always be !=0 */
        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        public static ulong LSB(ulong bb) => Bmi1.X64.TrailingZeroCount(bb);

        /* extract the least significant bit of the bitboard */
        public static ulong ExtractLSB(ulong bb) => Bmi1.X64.ExtractLowestSetBit(bb);

        /* reset the least significant bit of bb */
        public static ulong ClearLSB(ulong bb) => Bmi1.X64.ResetLowestSetBit(bb);

        /* Macro to check and reset the castle rights:
           CastleSM: short castling side to move
           CastleLM: long castling side to move
           CastleSO: short castling opponent
           CastleLO: long castling opponent
         */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static bool CanCastleSM() => (Position.CastleFlags & 0x02) != 0;
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static bool CanCastleLM() => (Position.CastleFlags & 0x01) != 0;
        static void ResetCastleSM() => Position.CastleFlags &= 0xFD;
        static void ResetCastleLM() => Position.CastleFlags &= 0xFE;
        static void ResetCastleSO() => Position.CastleFlags &= 0xDF;
        static void ResetCastleLO() => Position.CastleFlags &= 0xEF;

        /* these Macros are used to calculate the bitboard of a particular kind of piece
           P2 P1 P0
            0  0  0    empty
            0  0  1    pawn
            0  1  0    knight
            0  1  1    bishop
            1  0  0    rook
            1  0  1    queen
            1  1  0    king
        */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Occupation(ref TBoard position) => position.P0 | position.P1 | position.P2; /* board occupation */

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Pawns(ref TBoard position) => Bmi1.X64.AndNot(position.P2, Bmi1.X64.AndNot(position.P1, position.P0));

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Knights(ref TBoard position) => Bmi1.X64.AndNot(position.P0, Bmi1.X64.AndNot(position.P2, position.P1));

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Bishops(ref TBoard position) => position.P0 & position.P1;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Rooks(ref TBoard position) => Bmi1.X64.AndNot(position.P0, Bmi1.X64.AndNot(position.P1, position.P2));

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Queens(ref TBoard position) => position.P0 & position.P2;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong QueenOrRooks(ref TBoard position) => Bmi1.X64.AndNot(position.P1, position.P2);

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong QueenOrBishops(ref TBoard position) => position.P0 & (position.P2 | position.P1);

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Kings(ref TBoard position) => position.P1 & position.P2; /* a bitboard with the 2 kings */

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong SideToMove(ref TBoard position) => position.PM;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static byte EnPass(ref TBoard position) => position.EnPassant;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Opposing(ref TBoard position) => position.PM ^ (position.P0 | position.P1 | position.P2);

        /* get the piece type giving the square */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static ulong Piece(int sq) => ((Position.P2 >> sq) & 1) << 2 |
                                              ((Position.P1 >> sq) & 1) << 1 |
                                              ((Position.P0 >> sq) & 1);

        /* calculate the square related to the opponent */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static int OppSq(int sp) => sp ^ 56;
        /* Absolute Square, we need this macro to return the move in long algebric notation  */
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        static int AbsSq(int sq, int col) => col == WHITE ? sq : OppSq(sq);

        /* get the corresponding string to the given move  */
        static string MoveToStr(TMove move, byte tomove)
        {
            Span<char> promo = stackalloc[] { ' ', ' ', 'n', 'b', 'r', 'q' };
            StringBuilder result = new StringBuilder(6);
            result.Append((char)('a' + AbsSq(move.From, tomove) % 8));
            result.Append((char)('1' + AbsSq(move.From, tomove) / 8));
            result.Append((char)('a' + AbsSq(move.To, tomove) % 8));
            result.Append((char)('1' + AbsSq(move.To, tomove) / 8));
            result.Append(promo[(byte)move.Promotion]);
            return result.ToString();
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        static void ChangeSide()
        {
            ref TBoard position = ref Position;
            position.PM = BinaryPrimitives.ReverseEndianness(position.PM ^ Occupation(ref position));
            position.P0 = BinaryPrimitives.ReverseEndianness(position.P0);
            position.P1 = BinaryPrimitives.ReverseEndianness(position.P1);
            position.P2 = BinaryPrimitives.ReverseEndianness(position.P2);/* reverse the board */
            position.CastleFlags = (byte)((position.CastleFlags >> 4) | (position.CastleFlags << 4));/* roll the castle rights */
            position.STM ^= BLACK; /* change the side to move */
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenRook2(int sq, ulong occupation)
        {
            var lMask = LateralMasks[sq];
            ulong index = Bmi2.X64.ParallelBitExtract(occupation, lMask);
            return Bmi2.X64.ParallelBitDeposit(AttackMapLateral[(sq * (128 * 256)) + (int)index], lMask);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenBishop2(int sq, ulong occupation)
        {
            var dMask = DiagonalMasks[sq];
            ulong index = Bmi2.X64.ParallelBitExtract(occupation, dMask);
            return Bmi2.X64.ParallelBitDeposit(AttackMapDiag[(sq * (64 * 256)) + (int)index], dMask);
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenRook(int sq, ulong occupation)
        {
            occupation ^= 1UL << sq; /* remove the selected piece from the occupation */

            return ((0x8080808080808080UL >> (int)(LSB((0x0101010101010101UL << sq) & (occupation | 0xFF00000000000000UL)) ^ 63)) & (0x0101010101010101UL << (int)MSB((0x8080808080808080UL >> (sq ^ 63)) & (occupation | 0x00000000000000FFUL)))) |
                   ((0xFF00000000000000UL >> (int)(LSB((0x00000000000000FFUL << sq) & (occupation | 0x8080808080808080UL)) ^ 63)) & (0x00000000000000FFUL << (int)MSB((0xFF00000000000000UL >> (sq ^ 63)) & (occupation | 0x0101010101010101UL))));
            /* From every direction find the first piece and from that piece put a mask in the opposite direction.
               Put togheter all the 4 masks and remove the moving piece */
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
        private static ulong GenBishop(int sq, ulong occupation)
        {
            /* it's the same as the rook */
            occupation ^= 1UL << sq;

            return ((0x8040201008040201UL >> (int)(LSB((0x8040201008040201UL << sq) & (occupation | 0xFF80808080808080UL)) ^ 63)) & (0x8040201008040201UL << (int)MSB((0x8040201008040201UL >> (sq ^ 63)) & (occupation | 0x01010101010101FFUL)))) |
                   ((0x8102040810204081UL >> (int)(LSB((0x8102040810204081UL << sq) & (occupation | 0xFF01010101010101UL)) ^ 63)) & (0x8102040810204081UL << (int)MSB((0x8102040810204081UL >> (sq ^ 63)) & (occupation | 0x80808080808080FFUL))));
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateKingMoves(ref TBoard position, ulong occupation, ref int index, TMove[] moves)
        {
            for (ulong pieces = Kings(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);

                // generate moves
                for (ulong destinations = occupation & KingDest[square]; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: TPieceType.KING | TPieceType.CHECKMOVEILLEGAL, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateKnightMoves(ref TBoard position, ulong occupation, ref int index, TMove[] moves, TPieceType moveType, ulong pinnedSquares)
        {
            for (ulong pieces = Knights(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType finalMoveType = moveType | (((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);

                // generate moves
                for (ulong destinations = occupation & KnightDest[square]; destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: finalMoveType, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateRookMoves(ref TBoard position, ulong combine, ulong occupation, ref int index, TMove[] moves, TPieceType moveType, ulong pinnedSquares)
        {
            for (ulong pieces = Rooks(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType finalMoveType = moveType | (((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);

                // generate moves
                for (ulong destinations = combine & GenRook(square, occupation); destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: finalMoveType, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateBishopMoves(ref TBoard position, ulong combine, ulong occupation, ref int index, TMove[] moves, TPieceType moveType, ulong pinnedSquares)
        {
            for (ulong pieces = Bishops(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType finalMoveType = moveType | (((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);

                // generate moves
                for (ulong destinations = combine & GenBishop(square, occupation); destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: finalMoveType, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateQueenMoves(ref TBoard position, ulong combine, ulong occupation, ref int index, TMove[] moves, TPieceType moveType, ulong pinnedSquares)
        {
            for (ulong pieces = Queens(ref position) & SideToMove(ref position); pieces != 0; pieces = ClearLSB(pieces))
            {
                byte square = (byte)LSB(pieces);
                TPieceType finalMoveType = moveType | (((pinnedSquares & (1UL << square)) != 0) ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);

                // generate moves
                for (ulong destinations = combine & (GenRook(square, occupation) | GenBishop(square, occupation)); destinations != 0; destinations = ClearLSB(destinations))
                {
                    moves[index++].Move = NewMove(MoveType: finalMoveType, From: square, To: (byte)LSB(destinations));
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static bool Illegal(TMove move, ulong checks, int kingsq)
        {
            ref TBoard position = ref Position;
            ulong From = 1UL << move.From;
            ulong king = 1UL << move.To;
            ulong newoccupation = (Occupation(ref position) ^ From) | king;
            ulong newopposing = Bmi1.X64.AndNot(king, Opposing(ref position));

            if ((move.MoveType & TPieceType.PIECE_MASK) == TPieceType.KING)
            {
                kingsq = move.To;
                checks = (KnightDest[kingsq] & Knights(ref position)) | ((((king << 9) & 0xFEFEFEFEFEFEFEFEUL) | ((king << 7) & 0x7F7F7F7F7F7F7F7FUL))) & Pawns(ref position) | (KingDest[kingsq] & Kings(ref position));
            }
            else if ((move.MoveType & TPieceType.EP) > 0)
            {
                // Handle cases where taking EP exposes the king to check
                ulong epTo = (king >> 8);
                newoccupation ^= epTo;
                newopposing ^= epTo;
            }

            return (checks & newopposing) > 0 ||
                   (GenBishop(kingsq, newoccupation) & QueenOrBishops(ref position) & newopposing) > 0 ||
                   (GenRook(kingsq, newoccupation) & QueenOrRooks(ref position) & newopposing) > 0;
        }


        [MethodImpl(MethodImplOptions.AggressiveOptimization)]
        private static void GenerateMoves(TMove[] moves, ref int index, TPieceType isInCheck, ulong pinnedSquares)
        {
            ref TBoard position = ref Position;
            ulong occupation = Occupation(ref position);
            ulong opposing = Opposing(ref position);
            ulong combine = ~occupation | opposing;

            // generate moves from king to knight
            GenerateKingMoves(ref position, combine, ref index, moves);
            GenerateQueenMoves(ref position, combine, occupation, ref index, moves, TPieceType.QUEEN | isInCheck, pinnedSquares);
            GenerateRookMoves(ref position, combine, occupation, ref index, moves, TPieceType.ROOK | isInCheck, pinnedSquares);
            GenerateBishopMoves(ref position, combine, occupation, ref index, moves, TPieceType.BISHOP | isInCheck, pinnedSquares);
            GenerateKnightMoves(ref position, combine, ref index, moves, TPieceType.KNIGHT | isInCheck, pinnedSquares);

            ulong pawns = Pawns(ref position) & SideToMove(ref position);
            GeneratePawnCaptures(moves, ref index, TPieceType.PAWN | isInCheck, pinnedSquares, opposing, pawns);
            GeneratePawnPromotions(moves, ref index, TPieceType.PAWN | TPieceType.PROMO | isInCheck, pinnedSquares, occupation, opposing, pawns);
            GenerateEnPassant(moves, ref index, position, pawns);
            GeneratePawnPush(moves, ref index, TPieceType.PAWN | isInCheck, pinnedSquares, position, occupation);
            GenerateCastling(moves, ref index, position, occupation, opposing);
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateEnPassant(TMove[] moves, ref int index, TBoard position, ulong pawns)
        {
            byte enPass;
            if ((enPass = EnPass(ref position)) != 8)
            {
                for (ulong enpassant = pawns & EnPassant[enPass]; enpassant != 0; enpassant = ClearLSB((enpassant)))
                    moves[index++].Move = NewMove(
                        MoveType: TPieceType.PAWN | TPieceType.EP | TPieceType.PROMO | TPieceType.CHECKMOVEILLEGAL,
                        From: (byte)LSB(enpassant),
                        To: (byte)(40 + enPass)
                    );
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GenerateCastling(TMove[] moves, ref int index, TBoard position, ulong occupation, ulong opposing)
        {
            /* check if long castling is possible */
            if (CanCastleLM()
            && (occupation & 0x0EUL) == 0
            && (((((ExtractLSB(0x1010101010101000UL & occupation) /* column e */
                 | ExtractLSB(0x0808080808080800UL & occupation) /*column d */
                 | ExtractLSB(0x0404040404040400UL & occupation) /*column c */
                 | ExtractLSB(0x00000000000000E0UL & occupation)  /* row 1 */) & QueenOrRooks(ref position))
                 | ((ExtractLSB(0x0000000102040800UL & occupation) /*antidiag from e1/e8 */
                 | ExtractLSB(0x0000000001020400UL & occupation) /*antidiag from d1/d8 */
                 | ExtractLSB(0x0000000000010200UL & occupation) /*antidiag from c1/c8 */
                 | ExtractLSB(0x0000000080402000UL & occupation) /*diag from e1/e8 */
                 | ExtractLSB(0x0000008040201000UL & occupation) /*diag from d1/d8 */
                 | ExtractLSB(0x0000804020100800UL & occupation) /*diag from c1/c8 */) & QueenOrBishops(ref position))
                 | (0x00000000003E7700UL & Knights(ref position))
                 | (0x0000000000003E00UL & Pawns(ref position))
                 | (Kings(ref position) & 0x0000000000000600UL)) & opposing) == 0))
                /* check if c1/c8 d1/d8 e1/e8 are not attacked */
                moves[index++].Move = NewMove
            (
                MoveType: TPieceType.KING | TPieceType.CASTLE,
                From: 4,
                To: 2
            );

            /* check if short castling is possible */
            if (CanCastleSM()
            && (occupation & 0x60UL) == 0
            && (((((ExtractLSB(0x1010101010101000UL & occupation) /* column e */
                 | ExtractLSB(0x2020202020202000UL & occupation) /* column f */
                 | ExtractLSB(0x4040404040404000UL & occupation) /* column g */
                 | 1UL << (int)MSB(0x000000000000000FUL & (occupation | 0x1UL))/* row 1 */) & QueenOrRooks(ref position))
                 | ((ExtractLSB(0x0000000102040800UL & occupation) /* antidiag from e1/e8 */
                 | ExtractLSB(0x0000010204081000UL & occupation) /*antidiag from f1/f8 */
                 | ExtractLSB(0x0001020408102000UL & occupation) /*antidiag from g1/g8 */
                 | ExtractLSB(0x0000000080402000UL & occupation) /*diag from e1/e8 */
                 | ExtractLSB(0x0000000000804000UL & occupation) /*diag from f1/f8 */
                 | 0x0000000000008000UL /*diag from g1/g8 */) & QueenOrBishops(ref position))
                 | (0x0000000000F8DC00UL & Knights(ref position))
                 | (0x000000000000F800UL & Pawns(ref position))
                 | (Kings(ref position) & 0x0000000000004000UL)) & opposing) == 0))
                /* check if e1/e8 f1/f8 g1/g8 are not attacked */
                moves[index++].Move = NewMove
                    (
                        MoveType: TPieceType.KING | TPieceType.CASTLE,
                        From: 4,
                        To: 6
                    );
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GeneratePawnPush(TMove[] moves, ref int index, TPieceType isInCheck, ulong kingRay, TBoard position, ulong occupation)
        {
            ulong push1 = (((Pawns(ref position) & SideToMove(ref position)) << 8) & ~occupation) & 0x00FFFFFFFFFFFFFFUL;
            ulong push2 = (push1 << 8) & ~occupation & 0x00000000FF000000UL;
            byte square;
            /* one pawns push */
            for (; push1 != 0; push1 = ClearLSB(push1))
            {
                square = (byte)LSB(push1);
                moves[index++].Move = NewMove
                (
                    MoveType: isInCheck | ((kingRay & (1UL << (square - 8))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY),
                    From: (byte)(square - 8),
                    To: square
                );
            }

            /* double pawns pushes */
            for (; push2 != 0; push2 = ClearLSB(push2))
            {
                square = (byte)LSB(push2);
                moves[index++].Move = NewMove
                (
                    MoveType: isInCheck | ((kingRay & (1UL << (square - 16))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY),
                    From: (byte)(square - 16),
                    To: square
                );
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GeneratePawnCaptures(TMove[] moves, ref int index, TPieceType isInCheck, ulong kingRay, ulong opposing, ulong pawns)
        {
            byte square;
            TPieceType pinned;
            ulong lsb;
            /* Generate pawns right captures */
            for (lsb = (pawns << 9) & 0x00FEFEFEFEFEFEFEUL & opposing; lsb != 0; lsb = ClearLSB(lsb))
            {
                square = (byte)LSB(lsb);
                pinned = (kingRay & (1UL << (square - 9))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                moves[index++].Move = NewMove(
                    MoveType:  isInCheck | pinned,
                    From: (byte)(square - 9),
                    To: square
                );
            }

            /* Generate pawns left captures */
            for (lsb = (pawns << 7) & 0x007F7F7F7F7F7F7FUL & opposing; lsb != 0; lsb = ClearLSB(lsb))
            {
                square = (byte)LSB(lsb);
                pinned = (kingRay & (1UL << (square - 7))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY; ;
                moves[index++].Move = NewMove(
                    MoveType:  isInCheck | pinned,
                    From: (byte)(square - 7),
                    To: (byte)square
                );
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void GeneratePawnPromotions(TMove[] moves, ref int index, TPieceType isInCheck, ulong kingRay, ulong occupation, ulong opposing, ulong pawns)
        {
            byte square;
            TPieceType pinned;
            TPieceType piece;
            ulong lsb;
            /* Generate pawns promotions */
            if ((pawns & 0x00FF000000000000UL) != 0)
            {
                /* promotions with left capture */
                for (lsb = (pawns << 9) & 0xFE00000000000000UL & opposing; lsb != 0; lsb = ClearLSB(lsb))
                {
                    square = (byte)LSB(lsb);
                    pinned = isInCheck | ((kingRay & (1UL << (square - 9))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);
                    for (piece = TPieceType.QUEEN; piece >= TPieceType.KNIGHT; piece--)
                    {
                        moves[index++].Move = NewMovePromo(
                            MoveType:  pinned,
                            From: (byte)(square - 9),
                            To: (byte)square,
                            Promotion: piece
                        );
                    }
                }

                /* promotions with right capture */
                for (lsb = (pawns << 7) & 0x7F00000000000000UL & opposing; lsb != 0; lsb = ClearLSB(lsb))
                {
                    square = (byte)LSB(lsb);
                    pinned = isInCheck | ((kingRay & (1UL << (square - 7))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);
                    for (piece = TPieceType.QUEEN; piece >= TPieceType.KNIGHT; piece--)
                        moves[index++].Move = NewMovePromo(
                            MoveType:  pinned,
                            From: (byte)(square - 7),
                            To: (byte)square,
                            Promotion: piece
                        );
                }
                /* no capture promotions */
                for (lsb = ((pawns << 8) & ~occupation) & 0xFF00000000000000UL; lsb != 0; lsb = ClearLSB(lsb))
                {
                    square = (byte)LSB(lsb);
                    pinned = isInCheck | ((kingRay & (1UL << (square - 8))) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY);
                    for (piece = TPieceType.QUEEN; piece >= TPieceType.KNIGHT; piece--)
                        moves[index++].Move = NewMovePromo(
                            MoveType: pinned,
                            From: (byte)(square - 8),
                            To: (byte)square,
                            Promotion: piece
                        );
                }
            }
        }

        [MethodImpl(MethodImplOptions.AggressiveOptimization | MethodImplOptions.AggressiveInlining)]
        private static void Make(TMove move)
        {
            ref TBoard position = ref Position;
            Game[iPosition++] = position;
            ulong part = 1UL << move.From;
            ulong dest = 1UL << move.To;
            position.EnPassant = 8;
            ulong opposing = Opposing(ref position);

            if ((opposing & dest) != 0)
            {
                /* Delete the captured piece */
                position.P0 &= ~dest;
                position.P1 &= ~dest;
                position.P2 &= ~dest;

                if (move.To == 63) ResetCastleSO(); /* captured the opponent king side rook */
                else if (move.To == 56) ResetCastleLO(); /* captured the opponent quuen side rook */
            }

            switch (move.MoveType & TPieceType.PIECE_MASK)
            {
                case TPieceType.PAWN:
                    if ((move.MoveType & (TPieceType.EP | TPieceType.PROMO)) != 0)
                    {
                        /* EnPassant */
                        if ((move.MoveType & TPieceType.EP) != 0)
                        {
                            position.PM ^= part | dest;
                            position.P0 ^= part | dest;
                            position.P0 ^= dest >> 8; //delete the captured pawn
                        }
                        else
                        {
                            position.PM ^= part | dest;
                            position.P0 ^= part;
                            position.P0 |= (ulong)((int)move.Promotion & 1) << move.To;
                            position.P1 |= (ulong)(((int)move.Promotion >> 1) & 1) << move.To;
                            position.P2 |= (ulong)((int)move.Promotion >> 2) << move.To;
                        }
                    }
                    else /* capture or push */
                    {
                        position.PM ^= part | dest;
                        position.P0 ^= part | dest;

                        if (move.To == move.From + 16 && (EnPassantM[move.To & 0x07] & Pawns(ref position) & opposing) != 0)
                            position.EnPassant = (byte)(move.To & 0x07); /* save enpassant column */
                    }

                    ChangeSide();
                    return;
                case TPieceType.KNIGHT:
                    position.PM ^= part | dest;
                    position.P1 ^= part | dest;
                    ChangeSide();
                    return;
                case TPieceType.BISHOP:
                    position.PM ^= part | dest;
                    position.P0 ^= part | dest;
                    position.P1 ^= part | dest;
                    ChangeSide();
                    return;
                case TPieceType.ROOK:
                    position.PM ^= part | dest;
                    position.P2 ^= part | dest;
                    if (move.From == 7)
                        ResetCastleSM(); //king side rook moved
                    else if (move.From == 0)
                        ResetCastleLM(); // queen side rook moved
                    ChangeSide();
                    return;
                case TPieceType.QUEEN:
                    position.PM ^= part | dest;
                    position.P0 ^= part | dest;
                    position.P2 ^= part | dest;

                    ChangeSide();
                    return;
                case TPieceType.KING:
                    position.PM ^= part | dest;
                    position.P1 ^= part | dest;
                    position.P2 ^= part | dest;
                    ResetCastleSM(); /* update the castle rights */
                    ResetCastleLM();

                    if ((move.MoveType & TPieceType.CASTLE) != 0)
                    {
                        if (move.To == 6)
                        {
                            position.PM ^= 0x00000000000000A0UL;
                            position.P2 ^= 0x00000000000000A0UL;
                        } /* short castling */
                        else
                        {
                            position.PM ^= 0x0000000000000009UL;
                            position.P2 ^= 0x0000000000000009UL;
                        } /* long castling */
                    }
                    ChangeSide();
                    return;
            }
        }

        private static void LoadPosition(string fen)
        {
            /* Clear the board */
            ref TBoard pos = ref Position;
            pos.P0 = pos.P1 = pos.P2 = pos.PM = 0;
            pos.EnPassant = 8;
            pos.STM = WHITE;
            pos.CastleFlags = 0;

            /* translate the fen to the relative position */
            byte pieceside = WHITE;
            ulong piece = (ulong)TPieceType.PAWN;
            byte sidetomove = WHITE;
            int square = 0;
            int cursor;
            for (cursor = 0; fen[cursor] != ' '; cursor++)
            {
                char cur = fen[cursor];
                if (cur >= '1' && cur <= '8')
                    square += cur - '0';
                else if (cur != '/')
                {
                    int bit = OppSq(square);
                    if (cur == 'p') { piece = (ulong)TPieceType.PAWN; pieceside = BLACK; }
                    else if (cur == 'n') { piece = (ulong)TPieceType.KNIGHT; pieceside = BLACK; }
                    else if (cur == 'b') { piece = (ulong)TPieceType.BISHOP; pieceside = BLACK; }
                    else if (cur == 'r') { piece = (ulong)TPieceType.ROOK; pieceside = BLACK; }
                    else if (cur == 'q') { piece = (ulong)TPieceType.QUEEN; pieceside = BLACK; }
                    else if (cur == 'k') { piece = (ulong)TPieceType.KING; pieceside = BLACK; }
                    else if (cur == 'P') { piece = (ulong)TPieceType.PAWN; pieceside = WHITE; }
                    else if (cur == 'N') { piece = (ulong)TPieceType.KNIGHT; pieceside = WHITE; }
                    else if (cur == 'B') { piece = (ulong)TPieceType.BISHOP; pieceside = WHITE; }
                    else if (cur == 'R') { piece = (ulong)TPieceType.ROOK; pieceside = WHITE; }
                    else if (cur == 'Q') { piece = (ulong)TPieceType.QUEEN; pieceside = WHITE; }
                    else if (cur == 'K') { piece = (ulong)TPieceType.KING; pieceside = WHITE; }
                    pos.P0 |= (piece & 1) << bit; //001
                    pos.P1 |= ((piece >> 1) & 1) << bit; //010
                    pos.P2 |= (piece >> 2) << bit; //100
                    if (pieceside == WHITE)
                    {
                        pos.PM |= (1UL << bit);
                        piece |= BLACK;
                    }
                    square++;
                }
            }

            cursor++; /* read the side to move  */
            if (fen[cursor] == 'w')
                sidetomove = WHITE;
            else if (fen[cursor] == 'b')
                sidetomove = BLACK;
            cursor += 2;
            if (fen[cursor] != '-') /* read the castle rights */
            {
                for (; fen[cursor] != ' '; cursor++)
                {
                    char cur = fen[cursor];
                    if (cur == 'K') pos.CastleFlags |= 0x02;
                    else if (cur == 'Q') pos.CastleFlags |= 0x01;
                    else if (cur == 'k') pos.CastleFlags |= 0x20;
                    else if (cur == 'q') pos.CastleFlags |= 0x10;
                }
                cursor++;
            }
            else cursor += 2;
            if (fen[cursor] != '-') /* read the enpassant column */
            {
                pos.EnPassant = (byte)(fen[cursor] - 'a');
                //cursor++;
            }
            if (sidetomove == BLACK)
                ChangeSide();
        }

        private static long Perft(int depth)
        {
            long total = 0;
            int index = 0;
            var moveList = MovesLists[depth];
            ref TBoard position = ref Position;
            ulong king = Kings(ref position) & SideToMove(ref position);
            byte kingsq = (byte)LSB(king);
            ulong occupation = Occupation(ref position);
            ulong opposing = Opposing(ref position);
            ulong kingRayDiag = GenBishop(kingsq, occupation);
            ulong kingRayLat = GenRook(kingsq, occupation);

            ulong checks = (KnightDest[kingsq] & Knights(ref position)) | ((((king << 9) & 0xFEFEFEFEFEFEFEFEUL) | ((king << 7) & 0x7F7F7F7F7F7F7F7FUL)) & Pawns(ref position)) | (KingDest[kingsq] & Kings(ref position));

            TPieceType inCheck = (kingRayDiag & QueenOrBishops(ref position) & opposing) != 0 ||
                (kingRayLat & QueenOrRooks(ref position) & opposing) != 0 ||
                (checks & opposing) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

            ulong opposingDiag = QueenOrBishops(ref position) & opposing;
            ulong opposingLat = QueenOrRooks(ref position) & opposing;

            ulong kingRayD = 0;
            ulong kingRayA = 0;
            ulong kingRayF = 0;
            ulong kingRayR = 0;

            if ((DiagonalMasksA[kingsq] & opposingDiag) != 0)
            {
                kingRayA = DiagonalMasksA[kingsq] & kingRayDiag;
            }

            if ((DiagonalMasksD[kingsq] & opposingDiag) != 0)
            {
                kingRayD = DiagonalMasksD[kingsq] & kingRayDiag;
            }

            if ((LateralMasksF[kingsq] & opposingLat) != 0)
            {
                kingRayF = LateralMasksF[kingsq] & kingRayLat;
            }

            if ((LateralMasksR[kingsq] & opposingLat) != 0)
            {
                kingRayR = LateralMasksR[kingsq] & kingRayLat;
            }

            ulong pinnedSquares = (kingRayD | kingRayA | kingRayF | kingRayR);

            GenerateMoves(moveList, ref index, inCheck, pinnedSquares);

            for (int i = 0; i < index; i++)
            {
                if ((moveList[i].MoveType & TPieceType.CHECKMOVEILLEGAL) != 0 && Illegal(moveList[i], checks, kingsq))
                    continue;
                if (depth > 1)
                {
                    Make(moveList[i]);

                    total += Perft(depth - 1);
                    Position = Game[--iPosition];
                }
                else
                    total++;
            }

            return total;
        }

        struct PerftResult
        {
            public double Duration;
            public long Nodes;

            public PerftResult(double t, long n)
            {
                Duration = t;
                Nodes = n;
            }

            public static PerftResult operator +(PerftResult a, PerftResult b) => new(a.Duration + b.Duration, a.Nodes + b.Nodes);
        }

        private static PerftResult TestPerft(string fen, int depth, int expectedResult)
        {
            LoadPosition(fen);

            long t0 = Stopwatch.GetTimestamp();
            long count = Perft(depth);
            long t1 = Stopwatch.GetTimestamp();
            double dt = (t1 - t0) / (double)Stopwatch.Frequency;
            double ms = (1000 * dt);
            if (expectedResult != count)
            {
                Console.WriteLine($"ERROR in Perft({fen}, {depth})");
                Console.WriteLine($"Computed result: {count}");
                Console.WriteLine($"Expected result: {expectedResult}");
            }
            else
                Console.WriteLine($"OK! {(int)ms}ms, {(int)(count / ms)}K NPS");
            return new PerftResult(dt, count);
        }

        public static void Main()
        {
            Console.WriteLine("QBB Perft in C#");
            Console.WriteLine("https://github.com/lithander/QBB-Perft/tree/v1.4");
            Console.WriteLine();

            PerftResult totalTime = new PerftResult(0, 0);
            totalTime += TestPerft("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1", 6, 119060324); //Start Position
            totalTime += TestPerft("r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1", 5, 193690690);
            totalTime += TestPerft("8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 1", 7, 178633661);
            totalTime += TestPerft("r3k2r/Pppp1ppp/1b3nbN/nP6/BBP1P3/q4N2/Pp1P2PP/R2Q1RK1 w kq - 0 1", 6, 706045033);
            totalTime += TestPerft("rnbqkb1r/pp1p1ppp/2p5/4P3/2B5/8/PPP1NnPP/RNBQK2R w KQkq - 0 6", 3, 53392);
            totalTime += TestPerft("r4rk1/1pp1qppp/p1np1n2/2b1p1B1/2B1P1b1/P1NP1N2/1PP1QPPP/R4RK1 w - - 0 10", 5, 164075551);
            Console.WriteLine();
            Console.WriteLine($"Total: {totalTime.Nodes} Nodes, {(int)(1000 * totalTime.Duration)}ms, {(int)(totalTime.Nodes / totalTime.Duration / 1000)}K NPS");
            Console.WriteLine("Press any key to quit");//stop command prompt from closing automatically on windows
            //Console.ReadKey();
        }

        //**** Debug Helpers not in the original C code

        private static void PrintPosition(TBoard pos)
        {
            PrintBB(pos.PM, "PM");
            PrintBB(pos.P0, "P0");
            PrintBB(pos.P1, "P1");
            PrintBB(pos.P2, "P2");
            Console.WriteLine("- - - -");
            PrintBB(Pawns(ref pos), "Pawns");
            PrintBB(Knights(ref pos), "Knights");
            PrintBB(Bishops(ref pos), "Bishops");
            PrintBB(Rooks(ref pos), "Roosk");
            PrintBB(Queens(ref pos), "Queens");
            PrintBB(Kings(ref pos), "Kings");

            Console.WriteLine($"CastleFlags: {pos.CastleFlags}");  /* ..sl..SL  short long opponent SHORT LONG side to move */
            Console.WriteLine($"EnPassant column: {pos.EnPassant} (8 if not set)");
            Console.WriteLine($"SideToMove: {pos.STM}"); /* side to move */
            Console.WriteLine();
        }

        static void PrintBB(ulong bb, string label)
        {
            if (label != null)
                Console.WriteLine(label);
            Console.WriteLine(Convert.ToString((long)bb, 16).PadLeft(16, '0'));
            Console.WriteLine("----------------");
            byte[] bbBytes = BitConverter.GetBytes(bb);
            Array.Reverse(bbBytes);
            foreach (byte bbByte in bbBytes)
            {
                string line = Convert.ToString(bbByte, 2).PadLeft(8, '0');
                line = line.Replace('1', 'X');
                line = line.Replace('0', '.');
                var chars = line.ToCharArray();
                Array.Reverse(chars);
                Console.WriteLine(string.Join(' ', chars));
            }
            Console.WriteLine();
        }

        private static long Divide(int depth)
        {
            long total = 0;
            int index = 0;
            var moveList = MovesLists[depth];
            ref TBoard position = ref Position;
            ulong king = Kings(ref position) & SideToMove(ref position);
            byte kingsq = (byte)LSB(king);
            ulong occupation = Occupation(ref position);
            ulong opposing = Opposing(ref position);
            ulong kingRayDiag = GenBishop(kingsq, occupation);
            ulong kingRayLat = GenRook(kingsq, occupation);

            ulong checks = (KnightDest[kingsq] & Knights(ref position)) | ((((king << 9) & 0xFEFEFEFEFEFEFEFEUL) | ((king << 7) & 0x7F7F7F7F7F7F7F7FUL)) & Pawns(ref position)) | (KingDest[kingsq] & Kings(ref position));

            TPieceType inCheck = (kingRayDiag & QueenOrBishops(ref position) & opposing) != 0 ||
                (kingRayLat & QueenOrRooks(ref position) & opposing) != 0 ||
                (checks & opposing) != 0 ? TPieceType.CHECKMOVEILLEGAL : TPieceType.EMPTY;

            ulong opposingDiag = QueenOrBishops(ref position) & opposing;
            ulong opposingLat = QueenOrRooks(ref position) & opposing;

            ulong kingRayD = 0;
            ulong kingRayA = 0;
            ulong kingRayF = 0;
            ulong kingRayR = 0;

            if ((DiagonalMasksA[kingsq] & opposingDiag) != 0)
            {
                kingRayA = DiagonalMasksA[kingsq] & kingRayDiag;
            }

            if ((DiagonalMasksD[kingsq] & opposingDiag) != 0)
            {
                kingRayD = DiagonalMasksD[kingsq] & kingRayDiag;
            }

            if ((LateralMasksF[kingsq] & opposingLat) != 0)
            {
                kingRayF = LateralMasksF[kingsq] & kingRayLat;
            }

            if ((LateralMasksR[kingsq] & opposingLat) != 0)
            {
                kingRayR = LateralMasksR[kingsq] & kingRayLat;
            }

            ulong pinnedSquares = (kingRayD | kingRayA | kingRayF | kingRayR);

            GenerateMoves(moveList, ref index, inCheck, pinnedSquares);

            for (int i = 0; i < index; i++)
            {
                if ((moveList[i].MoveType & TPieceType.CHECKMOVEILLEGAL) != 0 && Illegal(moveList[i], checks, kingsq))
                    continue;

                long nodes = 1;
                if (depth > 1)
                {
                    Make(moveList[i]);

                    nodes = Perft(depth - 1);
                    Position = Game[--iPosition];
                }

                total += nodes;

                Console.WriteLine($"  {MoveToStr(moveList[i], Position.STM)}:    {nodes:N0}");
            }

            return total;
        }

    }
}