How can i improve my C# engine speed?

pedrojdm2021 · Post by **pedrojdm2021** » Wed Jun 30, 2021 6:49 am

Hello, i am really happy with the results that i'm getting so far, but the speed is still not that great

For example in Kiwipete position:
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -

it does a search of depth 9 with this results:
Nodes searched: 1286811
Search time: 12596ms ( ~13 seconds ) in a AMD FX 8350 @4ghz CPU
so it means that my engine AI it's searching ~100k positions per second, i think it is pretty bad speed

Features that i have so far inside the search:
- Transposition Tables (~20mb hash table of 875000 entries)
- Bitboards with magic bitboard move generator
- Null move pruning
- PVS (principal variation search)
- LMR (Late move reductions)
- Extended Futility Pruning

The engine is developed in C#, because i am making a chess application in unity engine, so i have to keep that language at least for now, maybe in a future it can be a good idea develop a the engine in a .dll plugin and link it into the C# layer, but that is not the case right now.

In the other hand i have a pretty "good" speed in the perft results:
for example in the same Kiwipete position in depth 4 it does the 4085603 nodes in only ~700ms, so clearly the issue comes from the AI search itself.

i have executed the visual studio debugger and it points to this function that have the most heavy cpu use:

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private int GuessMoveScore(int _move)
        {   
            int score = 0;

            if (MoveUtility.Get_flag_capture(_move) > 0)
            {   
                // apply lookup to the MVV/LVA Table
                byte attacker = board.GetPieceTypeInSquare((byte)MoveUtility.Get_square_from(_move));
                byte victim = board.GetPieceTypeInSquare((byte)MoveUtility.Get_square_to(_move), true);
                score += AISettings.mvv_lva[attacker][victim] + 10000;
            }else 
            {
                // score first killer move
                if (killer_moves[0][ply] == _move)
                    score += 9000;
                // score second killer move
                else if (killer_moves[1][ply] == _move)
                    score += 8000;
                else 
                // score history move   
                    score += history_moves[board.sideToMove][MoveUtility.Get_square_from(_move)][MoveUtility.Get_square_to(_move)];
            }

            // give a little bonus if this is a promotion
            if (MoveUtility.Get_promotion(_move) > 0)
            {
                score += 325;
            }

            return score;
        }

you can see is a pretty simple scoring for MVV/LVA move ordering, the "only" problem is that the moved piece in a move is not begin encoded into the int move, instead i need to search for the piece from the board using the square index.

the search piece in the board function looks like this:

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public byte GetPieceTypeInSquare(byte _square , bool _adversary = false)
        {
            int player = _adversary ? sideToMove ^ 1 : sideToMove;
            if((1 & (bitboards[player][ChessPiece.Pawn] >> _square)) > 0) return ChessPiece.Pawn; 
            if((1 & (bitboards[player][ChessPiece.Bishop] >> _square)) > 0) return ChessPiece.Bishop; 
            if((1 & (bitboards[player][ChessPiece.Rook] >> _square)) > 0) return ChessPiece.Rook; 
            if((1 & (bitboards[player][ChessPiece.Knight] >> _square)) > 0) return ChessPiece.Knight; 
            if((1 & (bitboards[player][ChessPiece.Queen] >> _square)) > 0) return ChessPiece.Queen; 
            if((1 & (bitboards[player][ChessPiece.King] >> _square)) > 0) return ChessPiece.King; 
            return 0;
        }

in my point of view, i see it ok, i don't know if keeping an array of pieces in squares would be better, but it can slow the move generator as i have tested

the move ordering is done very similar as it's done by the VICE engine:
The move ordering is done inside the search loop in the negamnax, and it calls ' OrderMoves(Moveindex, ref move_list); '
and then that function calls the "GuessMoveScore" function to sort the moves on the list based on the current index.

so, anyone have any ideas to improve my nps? or another pruning technique that i should apply?

Cheers, Pedro.

mvanthoor · Post by **mvanthoor** » Wed Jun 30, 2021 11:12 am

pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am so, anyone have any ideas to improve my nps? or another pruning technique that i should apply?

No... first remove the pruning and optimizations to make the engine search as many moves as possible:

- Null move pruning
- PVS (principal variation search)
- LMR (Late move reductions)
- Extended Futility Pruning

Basically, leave only alpha-beta. What you're going to see is the real speed of the move generator, makemove, and unmake move.

Then run the engine through a profiler with perft 6 or perft 7 (if you're using C# and Visual Studio, you should have one in the IDE), and fix the slowest part. See how fast the engine is now. Run the profiler again, and fix the slowest part. Test... rinse, repeat.

Make sure you do _not_ allocate any memory in the perft run, or in the move generator. Later, when testing the search, also don't allocate any memory in the search.

After you optimize the the move generator, make and unmake as much as possible, you can start adding in the pruning and other optimizations again.

Karlo Bala · Post by **Karlo Bala** » Wed Jun 30, 2021 11:58 am

pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am Hello, i am really happy with the results that i'm getting so far, but the speed is still not that great

For example in Kiwipete position:
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -

it does a search of depth 9 with this results:
Nodes searched: 1286811
Search time: 12596ms ( ~13 seconds ) in a AMD FX 8350 @4ghz CPU
so it means that my engine AI it's searching ~100k positions per second, i think it is pretty bad speed

Features that i have so far inside the search:
- Transposition Tables (~20mb hash table of 875000 entries)
- Bitboards with magic bitboard move generator
- Null move pruning
- PVS (principal variation search)
- LMR (Late move reductions)
- Extended Futility Pruning

The engine is developed in C#, because i am making a chess application in unity engine, so i have to keep that language at least for now, maybe in a future it can be a good idea develop a the engine in a .dll plugin and link it into the C# layer, but that is not the case right now.

In the other hand i have a pretty "good" speed in the perft results:
for example in the same Kiwipete position in depth 4 it does the 4085603 nodes in only ~700ms, so clearly the issue comes from the AI search itself.

i have executed the visual studio debugger and it points to this function that have the most heavy cpu use:
Code: Select all
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private int GuessMoveScore(int _move)
        {   
            int score = 0;

            if (MoveUtility.Get_flag_capture(_move) > 0)
            {   
                // apply lookup to the MVV/LVA Table
                byte attacker = board.GetPieceTypeInSquare((byte)MoveUtility.Get_square_from(_move));
                byte victim = board.GetPieceTypeInSquare((byte)MoveUtility.Get_square_to(_move), true);
                score += AISettings.mvv_lva[attacker][victim] + 10000;
            }else 
            {
                // score first killer move
                if (killer_moves[0][ply] == _move)
                    score += 9000;
                // score second killer move
                else if (killer_moves[1][ply] == _move)
                    score += 8000;
                else 
                // score history move   
                    score += history_moves[board.sideToMove][MoveUtility.Get_square_from(_move)][MoveUtility.Get_square_to(_move)];
            }

            // give a little bonus if this is a promotion
            if (MoveUtility.Get_promotion(_move) > 0)
            {
                score += 325;
            }

            return score;
        }
you can see is a pretty simple scoring for MVV/LVA move ordering, the "only" problem is that the moved piece in a move is not begin encoded into the int move, instead i need to search for the piece from the board using the square index.

the search piece in the board function looks like this:
Code: Select all
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public byte GetPieceTypeInSquare(byte _square , bool _adversary = false)
        {
            int player = _adversary ? sideToMove ^ 1 : sideToMove;
            if((1 & (bitboards[player][ChessPiece.Pawn] >> _square)) > 0) return ChessPiece.Pawn; 
            if((1 & (bitboards[player][ChessPiece.Bishop] >> _square)) > 0) return ChessPiece.Bishop; 
            if((1 & (bitboards[player][ChessPiece.Rook] >> _square)) > 0) return ChessPiece.Rook; 
            if((1 & (bitboards[player][ChessPiece.Knight] >> _square)) > 0) return ChessPiece.Knight; 
            if((1 & (bitboards[player][ChessPiece.Queen] >> _square)) > 0) return ChessPiece.Queen; 
            if((1 & (bitboards[player][ChessPiece.King] >> _square)) > 0) return ChessPiece.King; 
            return 0;
        }
in my point of view, i see it ok, i don't know if keeping an array of pieces in squares would be better, but it can slow the move generator as i have tested

the move ordering is done very similar as it's done by the VICE engine:
The move ordering is done inside the search loop in the negamnax, and it calls ' OrderMoves(Moveindex, ref move_list); '
and then that function calls the "GuessMoveScore" function to sort the moves on the list based on the current index.

so, anyone have any ideas to improve my nps? or another pruning technique that i should apply?

Cheers, Pedro.

Your "GetPieceTypeInSquare" method is a mess.

It really should be just:

Code: Select all

pieceType = PieceType[Board[sq]]

or even simpler

Code: Select all

pieceType = PieceBoard[sq] >> 1

assyming your pieces are coded like WKing = 0, BKing = 1, WQueen = 2, BQueen = 3, ...

If you are using the "pure" bitboard engine, I suggest adding one classical mailbox board.

lithander · Post by **lithander** » Wed Jun 30, 2021 11:59 am

I have run your tests in my own C# engine.

pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am For example in Kiwipete position:
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -

it does a search of depth 9 with this results:
Nodes searched: 1286811
Search time: 12596ms ( ~13 seconds ) in a AMD FX 8350 @4ghz CPU
so it means that my engine AI it's searching ~100k positions per second, i think it is pretty bad speed

Code: Select all

info depth 9 score cp -16 nodes 12194040 nps 1154519 time 10562 pv e2a6 b4c3 d2c3 e6d5 e5g4 h3g2 f3g2 f6g4 c3g7

It also takes me 10s to search to that depth but I'm searching 10x more nodes. My engine is not optimized for speed (I'm not using bitboards for example and heavily depend on the garbage collector... each movelist ist a new instance of List<Move> and I happily use Sort()) and my NPS are 10x better so I think you're right that there's something fishy...

You seem to have some heavy pruning going on here if you explore only so few nodes. How did your NPS develop from brute force search (which should be close to perft performance + the cost of eval) to the 100k NPS you see now?

pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am In the other hand i have a pretty "good" speed in the perft results:
for example in the same Kiwipete position in depth 4 it does the 4085603 nodes in only ~700ms, so clearly the issue comes from the AI search itself.

Code: Select all

White >> perft 4
  Moves:    4,085,603
  Seconds:  0.7209
  Moves/s:  5,667,033
  Operation took 0.7229s

That's about the same speed I get. I'm using a TT table for perft (do you?) and a faster computer. So your perft speed is much better than the search speed!

pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am in my point of view, i see it ok, i don't know if keeping an array of pieces in squares would be better, but it can slow the move generator as i have tested

This is how I sort MvvLva:

Code: Select all

        public void SortMvvLva(Board context)
        {
            int Score(Move move)
            {
                Piece victim = context[move.ToSquare];
                Piece attacker = context[move.FromSquare];
                return Pieces.MaxOrder * Pieces.Order(victim) - Pieces.Order(attacker) + Pieces.Order(move.Promotion);
            }
            Sort((a, b) => Score(b).CompareTo(Score(a)));
        }
        
        public const int MaxOrder = 6;

        //Pawn = 1, Knight = 2, Bishop = 3; Rook = 4, Queen = 5, King = 6
        public static int Order(Piece piece) => ((int)piece >> 2);

So as you see at least here a 8x8 array of pieces (the most naive board representation there is) comes in handy because you can just access the array 2 times, shift the values you find and compute the score with 1 multiplication and 1 subtraction. (and one addition to boost promotions but you can skip that)

I don't know how other bitboard users do this but maybe it makes sense to have both bitboards and an array of pieces? Because I can't see how you could extract the piece type on a specific square from bitboards as speedy as just looking it up in a a board-sized array.

P.S.: in my opinion you should not micromanage the Runtime by telling it what to inline specifically. It usually knows better than we do!

hgm · Post by **hgm** » Wed Jun 30, 2021 12:05 pm

I don't know C#, so I have no idea if the perceived slowness is simply a consequence of the language used.

You think there is a discrepancy between the nps in search and perft, but I don't think this is the case. 700ms for peft(4) on KiwiPete is also only 139knps. In interpreting the perft speed you seem to confuse nodes/sec with moves/sec. A perft(4) only does 97862 (= perft(3)) move generations.

lithander · Post by **lithander** » Wed Jun 30, 2021 12:23 pm

hgm wrote: ↑Wed Jun 30, 2021 12:05 pm I don't know C#, so I have no idea if the perceived slowness is simply a consequence of the language used.

I think the bad rep C# get's for being slow is unjustified. But I know that MMC is not a good engine to prove that!

hgm wrote: ↑Wed Jun 30, 2021 12:05 pm You think there is a discrepancy between the nps in search and perft, but I don't think this is the case. 700ms for peft(4) on KiwiPete is also only 139knps. In interpreting the perft speed you seem to confuse nodes/sec with moves/sec. A perft(4) only does 97862 (= perft(3)) move generations.

How can I count the 4,085,603 positions without generating the 4,085,603 moves to reach them? Is that the optimization people call bulk-counting?

Code: Select all

        private static long Perft(Board board, int depth)
        {
            if (depth == 0)
                return 1;

            //probe hash-tree
            if (PerftTable.Retrieve(board.ZobristHash, depth, out long childCount))
                return childCount;

            var moves = new LegalMoves(board);
            long sum = 0;
            foreach (var move in moves)
                sum += Perft(new Board(board, move), depth - 1);

            PerftTable.Store(board.ZobristHash, depth, sum);
            return sum;
        }

So in my implementation 4,085,603 board configurations have been visited and created by playing 4,085,603 legal moves. So move/s == nodes/s... and I just assumed Pedro did something equivalent.

I suppose I'm missing out on some clever optimization here?

mvanthoor · Post by **mvanthoor** » Wed Jun 30, 2021 12:30 pm

lithander wrote: ↑Wed Jun 30, 2021 11:59 am I don't know how other bitboard users do this but maybe it makes sense to have both bitboards and an array of pieces? Because I can't see how you could extract the piece type on a specific square from bitboards as speedy as just looking it up in a a board-sized array.

Yes, you can (and should) have an array of pieces even if you use bit-boards. The reason is that sometimes you want to know what piece type is on a square. For example, I don't simply use a capture flag in my move format. I store the piece type that was captured. The reason is that when I'm unmaking moves, I can immediately see the piece type that was captured and put it back onto the square it was before. (Among other reasons.)

The captured piece can be a Q, R, B, N, or P. I have a set of 6 bit-boards per side: one for each piece type. Now, if I capture a piece, I would need to loop through the 6 bit-boards (or at least 5; because the king doesn't count), and then see which one had a piece on the square I just landed on. When keeping a list of pieces (or in my case, just a mailbox board), I can see the piece type by:

Code: Select all

let capture = board.piece_list[to_square];

mvanthoor · Post by **mvanthoor** » Wed Jun 30, 2021 12:35 pm

lithander wrote: ↑Wed Jun 30, 2021 12:23 pm How can I count the 4,085,603 positions without generating the 4,085,603 moves to reach them? Is that the optimization people call bulk-counting?

Probably. When bulk-counting, you don't make/unmake the moves to reach the leaves. When doing perft(4), you can just go to perft(3), generate all the moves, and add perft(3) + move count. Note that this only works if you engine has a fully legal move generator, and that this is only a perft-trick which doesn't do anything for actual chess playing.

j.t. · Post by **j.t.** » Wed Jun 30, 2021 1:42 pm

pedrojdm2021 wrote: ↑Wed Jun 30, 2021 6:49 am so, anyone have any ideas to improve my nps?

Use an array bitAt[Square] that returns the bit at that square (can be computed at compile time). This way you don't need the bit shifts, just do (bitAt[_square] & bitboards[player][ChessPiece.King]). Then I would suggest encoding the moved piece into the move. Generally as much as possible without causing an unnecessary performance overhead at move generation. But the moved piece you definitely need to know at move generation anyway, and there won't be a big issue in storing them into the move struct/int.
Also I would store the captured piece type in the move, because you'll need it at least once later anyway, so doing it early doesn't hurt, but this way you don't need to do it mutliple times.

For example, I encode the following info into a move: sourceSquare, targetSquare, movedPiece, capturedPiece, promotedPiece, isCastled, capturedEnPassant, enPassantTarget (when i do a double push pawn move).

Generally, any time most of the search time is anywhere else but the evaluation, you have something fishy going on (at least in my experience).

Daniel Anulliero · Post by **Daniel Anulliero** » Wed Jun 30, 2021 2:48 pm

My engine Isa took 14 seconds to reach the d9 but I have an i5 1.8ghz , my engine is known as a slow one , but sure it'll take less time with your 4ghz CPU

How can i improve my C# engine speed?

How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?