Efficiently Generated Legal moves question

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Pio
Posts: 334
Joined: Sat Feb 25, 2012 10:42 pm
Location: Stockholm

Re: Efficiently Generated Legal moves question

Post by Pio »

pedrojdm2021 wrote: Sat Jul 10, 2021 3:12 am
Pio wrote: Sat Jul 10, 2021 1:35 am
pedrojdm2021 wrote: Fri Jul 09, 2021 10:51 pm
Ras wrote: Fri Jul 09, 2021 9:26 pm
pedrojdm2021 wrote: Fri Jul 09, 2021 11:13 amMy perft is only about 8,000,000 leaves per second
On my hardware, my engine has roughly 20M NPS for perft in the initial position, and 1.8M NPS for regular search. The move generator logic is the same as yours, with make/unmake, and the "in check" detection is without bitboards. Instead, with dedicated offset checks or even loops for sliders.

If you get from 8M in perft down to 300k in search, I don't think it's the move generation. Did you profile your code to see where the time really gets spent? With a similar perft-to-search ratio as I have, your search should be around 700k NPS, not 300k.
sure i did profiling, with the following results:

Image

My evaluation implementation is this:

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public int Evaluate()
        {
            mg_eval[ChessPiece.White] = 0;
            mg_eval[ChessPiece.Black] = 0;
            eg_eval[ChessPiece.White] = 0; 
            eg_eval[ChessPiece.Black] = 0;
            int gamePhase = 0;
            ulong piece_bitboard = 0ul;
            int bitCount_dummy = 0;
            int extra_score = 0;

            //byte square = 0;
            byte square_side = 0,player = 0, piece = 0, square = 0;

            for(player = 0; player <= 1; ++player)
            {
                for (piece = 1; piece <= 6; ++piece)
                {
                    piece_bitboard = board.bitboards[player][piece];
                    while(piece_bitboard > 0)
                    {
                        // update game-phase state
                        gamePhase += GAMEPHASE_WEIGHTS[piece];

                        square = BitUtility.GetLS1BIndex(piece_bitboard);
                        BitUtility.RemoveLS1B(ref piece_bitboard);
                        square_side = AISettings.square_table[player][square];

                        // Evaluate material
                        mg_eval[player] += AISettings.mg_material_score[piece];
                        eg_eval[player] += AISettings.eg_material_score[piece];

                        // Evaluate piece positional scores
                        switch(piece)
                        {
                            case ChessPiece.Pawn:
                            
                            extra_score = 0;

                            // ---------- [Pawn structure] -------------

                            // Double pawns penalty
                            
                            // Get the double pawns bit count
                            bitCount_dummy = BitUtility.CountBits(board.bitboards[player][piece] & ChessData.File_masks[square]);
                            // Verify existence of double pawns and apply penalty
                            if (bitCount_dummy > 0)
                            {
                                mg_eval[player] += bitCount_dummy * AIData.Double_pawn_penalty_opening;
                                eg_eval[player] += bitCount_dummy * AIData.Double_pawn_penalty_ending;
                            }

                            // Isolated pawns penalty

                            // verify existence of isolated pawns and apply penalty
                            if ((board.bitboards[player][piece] & ChessData.Isolated_masks[square]) == 0)
                            {
                                mg_eval[player] += AIData.Isolated_pawn_penalty_opening;
                                eg_eval[player] += AIData.Isolated_pawn_penalty_ending;
                            }
                            
                            // Passed pawns bonus

                            // Verify existence of passed pawns and apply bonus
                            if ((board.bitboards[player ^ 1][piece] & ChessData.Passed_masks[player][square]) == 0)
                            {
                                extra_score += AIData.PassedPawnBonus[ChessData.RanksIndices[square_side]];
                            }

                            // ---------------------------------------

                            // Pawn square tables
                            mg_eval[player] += AISettings.mg_pawn_table[square_side] + extra_score;
                            eg_eval[player] += AISettings.eg_pawn_table[square_side] + extra_score;

                            break;

                            case ChessPiece.Bishop:

                            // ---------- [Bishop Mobility] ------------

                            // Get mobility count
                            bitCount_dummy = BitUtility.CountBits(ChessData.GetBishopMoves(square , 
                                board.totalOccupancy));
                            
                            // apply mobility bonus
                            mg_eval[player] += (bitCount_dummy - 4) * 5;
                            eg_eval[player] += (bitCount_dummy - 4) * 5;

                            // ------------------------------------------

                            // Bishop square tables
                            mg_eval[player] += AISettings.mg_bishop_table[square_side];
                            eg_eval[player] += AISettings.eg_bishop_table[square_side];

                            break;

                            case ChessPiece.Rook:

                            // ---------- [Rook Structure] ------------------

                            extra_score = 0;

                            // Semi open file scoring
                            if ((board.bitboards[player][ChessPiece.Pawn] & ChessData.File_masks[square]) == 0)
                                extra_score += AIData.Semi_open_file_score;

                            // Open file scoreing
                            if (((board.bitboards[player][ChessPiece.Pawn] | board.bitboards[player ^ 1][ChessPiece.Pawn]) 
                                & ChessData.File_masks[square]) == 0)
                                extra_score += AIData.Open_file_score;
                            
                            // ----------------------------------------------
                            
                            // Rook square tables
                            mg_eval[player] += AISettings.mg_rook_table[square_side] + extra_score;
                            eg_eval[player] += AISettings.eg_rook_table[square_side] + extra_score;

                            break;

                            case ChessPiece.Knight:

                            // Knight square tables
                            mg_eval[player] += AISettings.mg_knight_table[square_side];
                            eg_eval[player] += AISettings.eg_knight_table[square_side];

                            break;

                            case ChessPiece.Queen:

                            // ------------ [ Queen mobility ] ------------

                            // Get mobility count
                            bitCount_dummy = BitUtility.CountBits(ChessData.GetQueenMoves(square , 
                                board.totalOccupancy));
                                                        
                            // Apply mobility bonus
                            mg_eval[player] += (bitCount_dummy - 9);
                            eg_eval[player] += (bitCount_dummy - 9) * 2;

                            // ----------------------------------------------

                            // Queen square tables
                            mg_eval[player] += AISettings.mg_queen_table[square_side];
                            eg_eval[player] += AISettings.eg_queen_table[square_side];

                            break;

                            case ChessPiece.King:

                            extra_score = 0;

                            // ------------ [ King Structure ] ------------

                            // Semi open file scoring (penalty)
                            if ((board.bitboards[player][ChessPiece.Pawn] & ChessData.File_masks[square]) == 0)
                                extra_score -= AIData.Semi_open_file_score;

                            // Open file scoreing (penalty)
                            if (((board.bitboards[player][ChessPiece.Pawn] | board.bitboards[player ^ 1][ChessPiece.Pawn]) 
                                & ChessData.File_masks[square]) == 0)
                                extra_score -= AIData.Open_file_score;
                            
                            // ------------ [ King Safety ] ----------------

                            extra_score += BitUtility.CountBits(ChessData.GetKingMoves(square) & 
                                board.occupancies[player]) * AIData.King_shield_bonus;

                            // ---------------------------------------------

                            // King square tables
                            mg_eval[player] += AISettings.mg_king_table[square_side] + extra_score;
                            eg_eval[player] += AISettings.eg_king_table[square_side] + extra_score;

                            break;
                        }

                    } 
                }
            }

            if (gamePhase > 24) gamePhase = 24;

            int mg_score = (mg_eval[ChessPiece.White] - mg_eval[ChessPiece.Black]) * GetPlayerSide(board.sideToMove);
            int eg_score = (eg_eval[ChessPiece.White] - eg_eval[ChessPiece.Black]) * GetPlayerSide(board.sideToMove);

            int eg_phase = 24 - gamePhase;
            
            return (mg_score * gamePhase + eg_score * eg_phase) / 24;
        }
but in visual studio it does not point to any relevant "hot point". In visual studio even doing an bitwise and operation (&) and performing a count bits lookup it represents a 5% in computation time? it was not supposed to be a really fast operation?

Image
I think you can speed up the evaluation with some small tricks:

1) You should be able to speed up the two dimensional arrays to one dimensional ones. I guess that is why the profiler said that your bitcount function was slow, not because it was so slow but because you did the two dimensional access within the function call.

2) You can reduce a couple of branches such as

if (bitCount_dummy > 0)
{
mg_eval[player] += bitCount_dummy * AIData.Double_pawn_penalty_opening;
eg_eval[player] += bitCount_dummy * AIData.Double_pawn_penalty_ending;
}
Just remove the if case since multiplying with 0 is zero. Test to see which is faster.

3) Remove the array access for example mg_eval[player] and replace it with mg_eval_white and mg_eval_black. Then you can remove unnecessary array accesses. If you want the code tidier I would probably just put a bool in the Evaluate function and written Evaluate(isWhite)-Evaluate(!isWhite)
Hey! thank you for the very useful info! i am going to try that, in C# when you have arr[X][Y] is usually an "Array inside array" aka "Jagged arrays" multi-dimensional arrays in C# are: arr[X,Y], but i do agree that a single-dimensional array will be better.

about the branches, yes, i am going to review it.

another question, do you know an efficient way to detect if the king is in check?

i use this technique:
https://www.chessprogramming.org/Square ... ck_by_Side

It works great, it is bug free
but i feel that is very slow

i did a benchmark in kiwipete position for that function call ( IsSquareAttacked(sq,player) ) and it reports nearly 40 cpu ticks for a single call :oops:
The best ways to speed up your code is to listen to hgm or Gerd Isenberg.

You can speed up IsSquaredAttack by something like this

Code: Select all


[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public bool IsSquareAttacked(byte _square, byte _currentPlayer) 
        {
           
            // Attacked by bishops or queens
            if((ChessData.GetBishopMoves(_square, totalOccupancy) & (bitboards[_currentPlayer ^ 1][ChessPiece.Bishop] | bitboards[_currentPlayer ^ 1][ChessPiece.Queen]))> 0 )
                return true;
            // Attacked by rooks or queens
            if((ChessData.GetRookMoves(_square, totalOccupancy) & (bitboards[_currentPlayer ^ 1][ChessPiece.Rook] | bitboards[_currentPlayer ^ 1][ChessPiece.Queen])) > 0)
                return true; 
            // Attacked by knights
            if((ChessData.GetKnightMoves(_square) & bitboards[_currentPlayer ^ 1][ChessPiece.Knight]) > 0 )
                return true;
            // Attacked by kings
            if((ChessData.GetKingMoves(_square) & bitboards[_currentPlayer ^ 1][ChessPiece.King]) > 0)
                return true;
            // Attacked by pawns 
            if ((ChessData.GetPawnAttacks(_square, _currentPlayer) & bitboards[_currentPlayer ^ 1][ChessPiece.Pawn]) > 0)
                return true;

            return false;
        }
Test also if it is good or not to inline the method. I guess you shouldn’t since it probably is called from lots of places. I guess you could remove one dimension from the bitboards. Probably you just want pawns[player] instead of bitboards[player][piecetype].

Good luck!
/Pio
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Efficiently Generated Legal moves question

Post by ZirconiumX »

pedrojdm2021 wrote: Fri Jul 09, 2021 10:51 pmbut in visual studio it does not point to any relevant "hot point". In visual studio even doing an bitwise and operation (&) and performing a count bits lookup it represents a 5% in computation time? it was not supposed to be a really fast operation?
Would you mind showing the implementation of BitUtility.CountBits? It seems like this is the intrinsic function you need to call, but I don't know C# very well.

For some more numbers in the pot, if I disable bulk counting at depth 1, I can perft at 11.56 million nodes per second, but search at 7.44 million nodes per second. So I don't think there's any meaningful relationship that implies a slow perft (where this is a very, very slow perft) means a slow search.
Some believe in the almighty dollar.

I believe in the almighty printf statement.
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: Efficiently Generated Legal moves question

Post by pedrojdm2021 »

ZirconiumX wrote: Sat Jul 10, 2021 11:03 pm
pedrojdm2021 wrote: Fri Jul 09, 2021 10:51 pmbut in visual studio it does not point to any relevant "hot point". In visual studio even doing an bitwise and operation (&) and performing a count bits lookup it represents a 5% in computation time? it was not supposed to be a really fast operation?
Would you mind showing the implementation of BitUtility.CountBits? It seems like this is the intrinsic function you need to call, but I don't know C# very well.

For some more numbers in the pot, if I disable bulk counting at depth 1, I can perft at 11.56 million nodes per second, but search at 7.44 million nodes per second. So I don't think there's any meaningful relationship that implies a slow perft (where this is a very, very slow perft) means a slow search.
sure is this one:

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static unsafe byte CountBits(ulong _bitboard)
        {
            // forked from microsoft's implementation in Sytem.Numeric.BitOperations (software fallback)
            // it is much faster than the traditional method using loops

            const ulong c1 = 0x_55555555_55555555ul;
            const ulong c2 = 0x_33333333_33333333ul;
            const ulong c3 = 0x_0F0F0F0F_0F0F0F0Ful;
            const ulong c4 = 0x_01010101_01010101ul;

            _bitboard -= (_bitboard >> 1) & c1;
            _bitboard = (_bitboard & c2) + ((_bitboard >> 2) & c2);
            _bitboard = (((_bitboard + (_bitboard >> 4)) & c3) * c4) >> 56;
            return (byte)_bitboard;
        }
i copied it from microsoft github repository, i did that because i can't use the native popcount becasue i am limited to use . Net framework up to 4.x not . Net framework 5 (is the minimun required version to use that)
ZirconiumX
Posts: 1334
Joined: Sun Jul 17, 2011 11:14 am

Re: Efficiently Generated Legal moves question

Post by ZirconiumX »

pedrojdm2021 wrote: Sun Jul 11, 2021 2:18 am i can't use the native popcount becasue i am limited to use . Net framework up to 4.x not . Net framework 5 (is the minimun required version to use that)
Unfortunately an approach like the one that that code uses is going to be significantly slower than native instructions generated by the JIT, and as a result that might well be why that code takes up a lot more time than expected.
Some believe in the almighty dollar.

I believe in the almighty printf statement.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Efficiently Generated Legal moves question

Post by hgm »

I guess the moral lesson is:
* Don't use a language that doesn't support the critical operations needed for the algorithm you want to implement
or
* If you are committed to a language, don't use an algorithm that relies on features the language cannot support.
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: Efficiently Generated Legal moves question

Post by pedrojdm2021 »

hgm wrote: Sun Jul 11, 2021 9:50 am I guess the moral lesson is:
* Don't use a language that doesn't support the critical operations needed for the algorithm you want to implement
or
* If you are committed to a language, don't use an algorithm that relies on features the language cannot support.
Yeah, you are totally right, my conclusion on this is that the possibly main cause is the slowness of C# in . net framework 4, i have seen similar implementations that does things similar as i do, and they have at least the double of speed that i actually have, i do not allocate on the heap now, i use pre calculated stuff as much as i can, the only thing is that i use bitwise operations very often as it is bitboard chess engine, but they should be very cheap operations, not heavy operations as Visual Studio marks in the profiler :/

so the only way to speed up this stuff that i see is to generate only legal moves, and that's it
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Efficiently Generated Legal moves question

Post by hgm »

Or refrain from using bitboards...
User avatar
emadsen
Posts: 434
Joined: Thu Apr 26, 2012 1:51 am
Location: Oak Park, IL, USA
Full name: Erik Madsen

Re: Efficiently Generated Legal moves question

Post by emadsen »

pedrojdm2021 wrote: Sun Jul 11, 2021 8:43 pmMy conclusion on this is that the possibly main cause is the slowness of C# in . net framework 4.
You are jumping to conclusions way too quickly. C# and .NET is not to blame for your engine performing at 1/10th the speed it should. My engine is written in C# using magic bitboards. It searches 3.7 million NPS in the WAC test suite, counting nodes only in the Board.PlayMove method.

In the source code of my C# engine I define #if POPCOUNT to a) fallback to loops (via Value &= Value - 1ul) to count set bits and b) use a De Bruijn sequence to find the first set bit for older processors that lack the popcount instruction. I haven't done extensive testing. But in a few positions I examined, the popcount version was 1.25x faster than the non-popcount version. A 20% slowdown due to missing support for popcount cannot explain why your engine is an order of magnitude (1/10) slower than it could be (you mentioned searching 1.4 million nodes in 3.6 sec = 388,888 NPS). It's not due to techniques used to count and locate bits.

In addition, I know Graham Banks uses the non-popcount version of MadChess when running tournaments. It's unlikely all CCRL testers use the non-popcount version of my engine (each tester has their own hardware), however, CCRL has rated MadChess at 2636 ELO. This aligns with my own testing on my home, popcount-capable, PC (2638 ELO). My point being, yes MadChess runs slower on non-popcount hardware, but so do most other engines. I haven't measure any detrimental effect of non-popcount hardware on my engine's playing strength.
Last edited by emadsen on Sun Jul 11, 2021 10:13 pm, edited 3 times in total.
My C# chess engine: https://www.madchess.net
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: Efficiently Generated Legal moves question

Post by pedrojdm2021 »

hgm wrote: Sun Jul 11, 2021 8:50 pm Or refrain from using bitboards...
yeah in C# seems to be a bad idea using bitboards, but anyways, i will leave this C# engine with the bitboards and the very last optimization that i've mentioned, and maybe in my next engine i'll do a proper implementation in C++ and i'll learn C++ on the road :D
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: Efficiently Generated Legal moves question

Post by pedrojdm2021 »

emadsen wrote: Sun Jul 11, 2021 9:59 pm
pedrojdm2021 wrote: Sun Jul 11, 2021 8:43 pmMy conclusion on this is that the possibly main cause is the slowness of C# in . net framework 4.
You are jumping to conclusions way too quickly. C# and .NET is not to blame for your engine performing at 1/10th the speed it should. My engine is written in C# using magic bitboards. It searches 3.7 million NPS in the WAC test suite, counting nodes only in the Board.PlayMove method.

In the source code of my C# engine I define #if POPCOUNT to a) fallback to loops (via Value &= Value - 1ul) to count set bits and b) use a De Bruijn sequence to find the first set bit for older processors that lack the popcount instruction. I haven't done extensive testing. But in a few positions I examined, the popcount version was 1.25x faster than the non-popcount version. A 20% slowdown due to missing support for popcount cannot explain why your engine is an order of magnitude (1/10) slower than it could be. It's not due to techniques used to count and locate bits.

In addition, I know Graham Banks uses the non-popcount version of MadChess when running tournaments. It's unlikely all CCRL testers use the non-popcount version of my engine (each tester has their own hardware), however, CCRL has rated MadChess at 2636 ELO. This aligns with my own testing on my home, popcount-capble, PC (2638 ELO). My point being, yes MadChess runs slower on non-popcount hardware, but so do most other engines. I haven't measure any detrimental effect of non-popcount hardware on my engine's playing strength.
Yeah but you use .net 5 that's the problem, any fast C# engine that i've seen is written in .net 5 in .net 4 we dont have access to the new "native" stuff in .net:

Just to name a few, in .net 4 we don't have:
- Span<T>
- System.Runtime.Intrinsics.X86;

i have to stick with normal arrays and without any "fancy" .net 5 feature that microsoft would have added in that version

In .net 4 bitwise operations seems to be extremly expensive:

a simple least significant index from some bitwise operations represents a 6%?

Image

i have tried many things and everything from visual studio points to the same: the bitwise operations and the "Square attacked by" (to detect checks) that i have implemented it as it is documented in the chess programming wiki :?

Sadly i can't use .net 5 because the framework that i am using to build the game itself (unity engine) does not support .net 5