How can i improve my C# engine speed?

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: How can i improve my C# engine speed?

Post by hgm »

lithander wrote: Wed Jun 30, 2021 12:23 pmHow can I count the 4,085,603 positions without generating the 4,085,603 moves to reach them? Is that the optimization people call bulk-counting?
You cannot. But, like you say, they are moves, not nodes. What a chess engine counts as a node in search is running the move generator (which then generates some 30-40 moves), or the evaluation. For calculating perft(4) you only have to run the move generator perft(3) times.

The 'bulk counting' optimization is that after you generated all moves and put them in an array (the 'move list'), you won't run through that array to increment a counter for every move, like

Code: Select all

for(i=0; i<nrOfMoves; i++) count++;
but use addition instead of counting:

Code: Select all

count += nrOfMoves;
This isn't as straightforward as that when you have a pseudo-legal move generator, as the move list might still contain some illegal moves. Which must not be counted in perft. So you might have to test some or all moves for legality, before you can count them. Depending on the design such a test might be most easily done by first making the move.

In Qperft I use a move generator that refrains from moving pinned pieces away from the pin line (by identifying pinned pieces before generation, and then using a dedicated generator for the moves of those). So the only illegal moves it can generate is for a King stepping into check. So it has to test legality of the King moves one by one, while it uses addition for the other moves. Something like

Code: Select all

count += nrOfMoves;
for(i=0; i<nrOfKingMoves; i++) {
  MakeMove(move[i]);
  if(InCheck()) count--;
  UnMake();
}
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: How can i improve my C# engine speed?

Post by hgm »

mvanthoor wrote: Wed Jun 30, 2021 12:35 pmProbably. When bulk-counting, you don't make/unmake the moves to reach the leaves. When doing perft(4), you can just go to perft(3), generate all the moves, and add perft(3) + move count. Note that this only works if you engine has a fully legal move generator, and that this is only a perft-trick which doesn't do anything for actual chess playing.
I would actually say the opposit:

Making/unmaking the moves of the final ply is a perft artifact, which has no parallel when using the engine to actually play chess. In search you only count nodes that do a move generation, and don't make/unmake the moves you don't want to search just for fun. The most common way for counting nodes in an engine is to count the number of times you run the move generator. One might argue that in QS nodes that suffer stand-pat cutoff you might not get to move generation, and that those are still counted. But there are very few such nodes, as normally nodes for which it is obvious that these would suffer a stand-pat cutoffs are already pruned in the parent ('futility pruning'). The number of nodes where the score is so close to alpha that you have to do a full evaluation (which cannot be done without making the move first) is really small, as you can see in the 'mailbox trials' thread. So in an engine you virtually always do a move generation once you enter Search(), and then in a leaf node don't make any of the generated nodes.

Futility pruning can be considered the search analog of bulk counting: when going through the move list in MVV order, you test the moves indifidually for futility (and if they are non-futile, you search them, and are thus not in a leaf node), while the first time you encounter a futile move, you discard the remainder of the list without looking at it (as these will be even more futile). So you 'bulk score' that bunch of moves below alpha.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: How can i improve my C# engine speed?

Post by mvanthoor »

hgm wrote: Wed Jun 30, 2021 3:57 pm The most common way for counting nodes in an engine is to count the number of times you run the move generator.
Hm... we talked about this some time ago, and I changed it from "count the node when you're going to do work / generate moves" to "count the node as soon as you visit it", because in that discussion you said that the latter was the more common. At least, that was what I understood of it back then. In the end it doesn't really matter, but the latter method obviously creates the largest NPS count.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: How can i improve my C# engine speed?

Post by mvanthoor »

mvanthoor wrote: Wed Jun 30, 2021 4:19 pm
hgm wrote: Wed Jun 30, 2021 3:57 pm The most common way for counting nodes in an engine is to count the number of times you run the move generator.
Hm... we talked about this some time ago, and I changed it from "count the node when you're going to do work / generate moves" to "count the node as soon as you visit it", because in that discussion you said that the latter was the more common. At least, that was what I understood of it back then. In the end it doesn't really matter, but the latter method obviously creates the largest NPS count.
Futility pruning can be considered the search analog of bulk counting: when going through the move list in MVV order, you test the moves indifidually for futility (and if they are non-futile, you search them, and are thus not in a leaf node), while the first time you encounter a futile move, you discard the remainder of the list without looking at it (as these will be even more futile). So you 'bulk score' that bunch of moves below alpha.
Thanks for that ad-hoc explanation. One less concept I have to research :)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: How can i improve my C# engine speed?

Post by hgm »

mvanthoor wrote: Wed Jun 30, 2021 4:19 pm
hgm wrote: Wed Jun 30, 2021 3:57 pm The most common way for counting nodes in an engine is to count the number of times you run the move generator.
Hm... we talked about this some time ago, and I changed it from "count the node when you're going to do work / generate moves" to "count the node as soon as you visit it", because in that discussion you said that the latter was the more common. At least, that was what I understood of it back then. In the end it doesn't really matter, but the latter method obviously creates the largest NPS count.
Well, for most engines this would hardly matter, because they would only enter the node when they have to do work there. I followed up the sentence you quote by an explanation of that.

The difference you allude to is especially large if you don't do futility pruning, in combination with 'lazy evaluation' (e.g. based on the incremental evaluation only). If this lazy evaluation, which requires zero work to obtain, is significantly above beta, you can return from the node immediately. (In QS. But in lowest approximation all nodes are QS nodes.) So you will count lots of zero-work nodes. But futility pruning in the parent would eliminate these nodes. Which still saves some time by eliminating the make/unmake leading to them, but decreases the node count even more.
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

Karlo Bala wrote: Wed Jun 30, 2021 11:58 am
pedrojdm2021 wrote: Wed Jun 30, 2021 6:49 am Hello, i am really happy with the results that i'm getting so far, but the speed is still not that great

For example in Kiwipete position:
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -

it does a search of depth 9 with this results:
Nodes searched: 1286811
Search time: 12596ms ( ~13 seconds ) in a AMD FX 8350 @4ghz CPU
so it means that my engine AI it's searching ~100k positions per second, i think it is pretty bad speed

Features that i have so far inside the search:
- Transposition Tables (~20mb hash table of 875000 entries)
- Bitboards with magic bitboard move generator
- Null move pruning
- PVS (principal variation search)
- LMR (Late move reductions)
- Extended Futility Pruning

The engine is developed in C#, because i am making a chess application in unity engine, so i have to keep that language at least for now, maybe in a future it can be a good idea develop a the engine in a .dll plugin and link it into the C# layer, but that is not the case right now.

In the other hand i have a pretty "good" speed in the perft results:
for example in the same Kiwipete position in depth 4 it does the 4085603 nodes in only ~700ms, so clearly the issue comes from the AI search itself.

i have executed the visual studio debugger and it points to this function that have the most heavy cpu use:

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private int GuessMoveScore(int _move)
        {   
            int score = 0;

            if (MoveUtility.Get_flag_capture(_move) > 0)
            {   
                // apply lookup to the MVV/LVA Table
                byte attacker = board.GetPieceTypeInSquare((byte)MoveUtility.Get_square_from(_move));
                byte victim = board.GetPieceTypeInSquare((byte)MoveUtility.Get_square_to(_move), true);
                score += AISettings.mvv_lva[attacker][victim] + 10000;
            }else 
            {
                // score first killer move
                if (killer_moves[0][ply] == _move)
                    score += 9000;
                // score second killer move
                else if (killer_moves[1][ply] == _move)
                    score += 8000;
                else 
                // score history move   
                    score += history_moves[board.sideToMove][MoveUtility.Get_square_from(_move)][MoveUtility.Get_square_to(_move)];
            }

            // give a little bonus if this is a promotion
            if (MoveUtility.Get_promotion(_move) > 0)
            {
                score += 325;
            }

            return score;
        }
you can see is a pretty simple scoring for MVV/LVA move ordering, the "only" problem is that the moved piece in a move is not begin encoded into the int move, instead i need to search for the piece from the board using the square index.

the search piece in the board function looks like this:

Code: Select all

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public byte GetPieceTypeInSquare(byte _square , bool _adversary = false)
        {
            int player = _adversary ? sideToMove ^ 1 : sideToMove;
            if((1 & (bitboards[player][ChessPiece.Pawn] >> _square)) > 0) return ChessPiece.Pawn; 
            if((1 & (bitboards[player][ChessPiece.Bishop] >> _square)) > 0) return ChessPiece.Bishop; 
            if((1 & (bitboards[player][ChessPiece.Rook] >> _square)) > 0) return ChessPiece.Rook; 
            if((1 & (bitboards[player][ChessPiece.Knight] >> _square)) > 0) return ChessPiece.Knight; 
            if((1 & (bitboards[player][ChessPiece.Queen] >> _square)) > 0) return ChessPiece.Queen; 
            if((1 & (bitboards[player][ChessPiece.King] >> _square)) > 0) return ChessPiece.King; 
            return 0;
        }
in my point of view, i see it ok, i don't know if keeping an array of pieces in squares would be better, but it can slow the move generator as i have tested :(

the move ordering is done very similar as it's done by the VICE engine:
The move ordering is done inside the search loop in the negamnax, and it calls ' OrderMoves(Moveindex, ref move_list); '
and then that function calls the "GuessMoveScore" function to sort the moves on the list based on the current index.

so, anyone have any ideas to improve my nps? or another pruning technique that i should apply?

Cheers, Pedro.
Your "GetPieceTypeInSquare" method is a mess.

It really should be just:

Code: Select all

pieceType = PieceType[Board[sq]] 
or even simpler

Code: Select all

pieceType = PieceBoard[sq] >> 1 
assyming your pieces are coded like WKing = 0, BKing = 1, WQueen = 2, BQueen = 3, ...

If you are using the "pure" bitboard engine, I suggest adding one classical mailbox board.
So.. do you think having an array of pieces in board will be better? right?
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

lithander wrote: Wed Jun 30, 2021 11:59 am I have run your tests in my own C# engine.
pedrojdm2021 wrote: Wed Jun 30, 2021 6:49 am For example in Kiwipete position:
[d]r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq -

it does a search of depth 9 with this results:
Nodes searched: 1286811
Search time: 12596ms ( ~13 seconds ) in a AMD FX 8350 @4ghz CPU
so it means that my engine AI it's searching ~100k positions per second, i think it is pretty bad speed

Code: Select all

info depth 9 score cp -16 nodes 12194040 nps 1154519 time 10562 pv e2a6 b4c3 d2c3 e6d5 e5g4 h3g2 f3g2 f6g4 c3g7
It also takes me 10s to search to that depth but I'm searching 10x more nodes. My engine is not optimized for speed (I'm not using bitboards for example and heavily depend on the garbage collector... each movelist ist a new instance of List<Move> and I happily use Sort()) and my NPS are 10x better so I think you're right that there's something fishy...

You seem to have some heavy pruning going on here if you explore only so few nodes. How did your NPS develop from brute force search (which should be close to perft performance + the cost of eval) to the 100k NPS you see now?
pedrojdm2021 wrote: Wed Jun 30, 2021 6:49 am In the other hand i have a pretty "good" speed in the perft results:
for example in the same Kiwipete position in depth 4 it does the 4085603 nodes in only ~700ms, so clearly the issue comes from the AI search itself.

Code: Select all

White >> perft 4
  Moves:    4,085,603
  Seconds:  0.7209
  Moves/s:  5,667,033
  Operation took 0.7229s
That's about the same speed I get. I'm using a TT table for perft (do you?) and a faster computer. So your perft speed is much better than the search speed!
pedrojdm2021 wrote: Wed Jun 30, 2021 6:49 am in my point of view, i see it ok, i don't know if keeping an array of pieces in squares would be better, but it can slow the move generator as i have tested :(
This is how I sort MvvLva:

Code: Select all

        public void SortMvvLva(Board context)
        {
            int Score(Move move)
            {
                Piece victim = context[move.ToSquare];
                Piece attacker = context[move.FromSquare];
                return Pieces.MaxOrder * Pieces.Order(victim) - Pieces.Order(attacker) + Pieces.Order(move.Promotion);
            }
            Sort((a, b) => Score(b).CompareTo(Score(a)));
        }
        
        public const int MaxOrder = 6;

        //Pawn = 1, Knight = 2, Bishop = 3; Rook = 4, Queen = 5, King = 6
        public static int Order(Piece piece) => ((int)piece >> 2);
So as you see at least here a 8x8 array of pieces (the most naive board representation there is) comes in handy because you can just access the array 2 times, shift the values you find and compute the score with 1 multiplication and 1 subtraction. (and one addition to boost promotions but you can skip that)

I don't know how other bitboard users do this but maybe it makes sense to have both bitboards and an array of pieces? Because I can't see how you could extract the piece type on a specific square from bitboards as speedy as just looking it up in a a board-sized array.

P.S.: in my opinion you should not micromanage the Runtime by telling it what to inline specifically. It usually knows better than we do! ;)
That's about the same speed I get. I'm using a TT table for perft (do you?) and a faster computer. So your perft speed is much better than the search speed!
I don't use TT table for perft, it is pure raw perft.
I don't know how other bitboard users do this but maybe it makes sense to have both bitboards and an array of pieces? Because I can't see how you could extract the piece type on a specific square from bitboards as speedy as just looking it up in a a board-sized array.
Yeah i am going to have an array of pieces in board and look for results.

Do you use static class for your AI search? a sealed class? i have heard that it can make some difference in performance but i don't know if it's that much
User avatar
MartinBryant
Posts: 87
Joined: Thu Nov 21, 2013 12:37 am
Location: Manchester, UK
Full name: Martin Bryant

Re: How can i improve my C# engine speed?

Post by MartinBryant »

You can mark blocks of code as 'unsafe' to use pointers.
You can use snippets of assembly.
You can turn off various runtime checks.

However, when .Net first came out in 2002/3 I wrote some engine parts in C# just to see how fast it was compared to C. (I was hoping to do a whole engine at the time.)
Sadly, even when utilising all of the above it was still only a fraction of the speed of compiled C.
Things may have improved since then but it had a long way to catch up.

I guess you have to weigh up whether your desire to stick with C# is more important than speed?
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

MartinBryant wrote: Wed Jun 30, 2021 5:32 pm You can mark blocks of code as 'unsafe' to use pointers.
You can use snippets of assembly.
You can turn off various runtime checks.

However, when .Net first came out in 2002/3 I wrote some engine parts in C# just to see how fast it was compared to C. (I was hoping to do a whole engine at the time.)
Sadly, even when utilising all of the above it was still only a fraction of the speed of compiled C.
Things may have improved since then but it had a long way to catch up.

I guess you have to weigh up whether your desire to stick with C# is more important than speed?
i tried that, and the experience was horrible, pointers in C# is a real mess, you can't have a pointer to a class intsnace like you do in C++ , everything is IntPtr, and you have to convert everything and store everything to it, it is not useful at least to me, i am pretty sure that the garbage collector does a better job than i can do using C# pointers, for using pointers in a language i will prefer a native language like C or C++ but not inside a garbage collected language
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: How can i improve my C# engine speed?

Post by lithander »

pedrojdm2021 wrote: Wed Jun 30, 2021 5:27 pm Do you use static class for your AI search? a sealed class? i have heard that it can make some difference in performance but i don't know if it's that much.
Compilers these days are very clever in turning code that doesn't look optimized into something that performs as good as possible. And the .Net Runtime does even more magic within it's blackbox. If I really care about the performance impact of such implementation details I don't assume anything but test it: Compile two versions that implement some kind of benchmark mode. For example exploring a set of FENs to a given depth. And then you run them both at the same time (so your computer load or the weather or whatever does impact both versions equally) and you see how much of a difference it makes.

I have also disabled "boost" clocks on my computer so that it runs more deterministically.

Be prepared to be surprised often in C#. For example the JIT compiler often needs a bit of a grace period within the first few seconds of your programs runtime to gather performance metrics and settle on the best machine code implementation so the first round of computing anything is often much slower than what follows.

Also... if you use Unity forget everything you know about the .NET runtime. For example runtime allocations aren't a big deal in the .Net Runtime and a huge problem in Unity.

Because this is a hobby project though I have so far only worried about algorithmic performance (if at all) but not implementation details like whether to use sealed classes.
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess