How can i improve my C# engine speed?

spirch · Post by **spirch** » Thu Jul 01, 2021 9:16 pm

this is what i do for sliding, don't forget my engine is 0x88 and I use unsafe/pointer with loop unrolling with constant
this is legal move only

so I have 8x this for the queen, ugly but I see some speed gain by doing this

Code: Select all

for (destTo = posFrom + Coord.North; destTo.ValidDestTo() && (board[destTo] & color) == Piece.None; destTo += Coord.North)
{
    //no check == is king in a potential check? if yes, check if the move is valid or not
    if (noCheck || Check.Move(engine, posFrom, destTo))
    {
        //increase my move array with the new move, d = posfrom + depth
        *++umove = (destTo << bit.MoveToShift) | d;
    }

    if (board[destTo] != Piece.None)
        break;
        
 }

spirch · Post by **spirch** » Thu Jul 01, 2021 9:19 pm

pedrojdm2021 wrote: ↑Thu Jul 01, 2021 9:10 pm
ok i understand, but you mean just an array of movements for the board class, right?

something like:
Code: Select all
int[] moves = null;

private void Init()
{
	moves = new int[128];
}

yes, make sure to make it right if your goal is to have multi thread, this could cause some headache down the line

my engine support multi-thread since I started with that in mind

Pio · Post by **Pio** » Thu Jul 01, 2021 9:22 pm

spirch wrote: ↑Thu Jul 01, 2021 9:19 pm
pedrojdm2021 wrote: ↑Thu Jul 01, 2021 9:10 pm
ok i understand, but you mean just an array of movements for the board class, right?

something like:
Code: Select all
int[] moves = null;

private void Init()
{
	moves = new int[128];
}
yes, make sure to make it right if your goal is to have multi thread, this could cause some headache down the line

my engine support multi-thread since I started with that in mind

Don’t put the initialisation in another function. Just write
var moves = new int[128]

Don’t put the moves in the board class. Just have it in the search function.

spirch · Post by **spirch** » Thu Jul 01, 2021 9:36 pm

Pio wrote: ↑Thu Jul 01, 2021 9:22 pm Don’t put the moves in the board class. Just have it in the search function.

put it where you have to initialize once for the lifetime of the long executing code, then pass it in as parameter, avoid = new inside any loop

ex; for my perft, I initialize that array in my perft method and reuse it until perft is done

hgm · Post by **hgm** » Thu Jul 01, 2021 10:08 pm

pedrojdm2021 wrote: ↑Thu Jul 01, 2021 8:44 pmhmmm i don't think that it will be the root of the problem, for example in each node you try to get a tranposition table struct from the current board's hash, it is usually heavier than a simple if.

People typically do this (nodes & CONST) thing to make sure that some very expensive code (like invoking a system call for reading the time, or testing whether there is pending input) is executed only once every so many nodes. If AbortSearch in your case is just a Boolean variable, then there is no reason at alll to shield the testing of it this way. Although it wouldn't slow you down much, for code simplicity it would be preferable to write

Code: Select all

if(AbortSearch) return 0;

pedrojdm2021 · Post by **pedrojdm2021** » Thu Jul 01, 2021 10:09 pm

Pio wrote: ↑Thu Jul 01, 2021 9:22 pm
spirch wrote: ↑Thu Jul 01, 2021 9:19 pm
pedrojdm2021 wrote: ↑Thu Jul 01, 2021 9:10 pm
ok i understand, but you mean just an array of movements for the board class, right?

something like:
Code: Select all
int[] moves = null;

private void Init()
{
	moves = new int[128];
}
yes, make sure to make it right if your goal is to have multi thread, this could cause some headache down the line

my engine support multi-thread since I started with that in mind
Don’t put the initialisation in another function. Just write
var moves = new int[128]

Don’t put the moves in the board class. Just have it in the search function.

The thing is that the movement generation is responsability for the board class, the AI responsability is done in another class: "AIController", and i'm going to initialize it only once

pedrojdm2021 · Post by **pedrojdm2021** » Thu Jul 01, 2021 10:11 pm

hgm wrote: ↑Thu Jul 01, 2021 10:08 pm
pedrojdm2021 wrote: ↑Thu Jul 01, 2021 8:44 pmhmmm i don't think that it will be the root of the problem, for example in each node you try to get a tranposition table struct from the current board's hash, it is usually heavier than a simple if.
People typically do this (nodes & CONST) thing to make sure that some very expensive code (like invoking a system call for reading the time, or testing whether there is pending input) is executed only once every so many nodes. If AbortSearch in your case is just a Boolean variable, then there is no reason at alll to shield the testing of it this way. Although it wouldn't slow you down much, for code simplicity it would be preferable to write
Code: Select all
if(AbortSearch) return 0;

yeah you are right, it is just a simple boolean

pedrojdm2021 · Post by **pedrojdm2021** » Fri Jul 02, 2021 12:17 am

spirch wrote: ↑Thu Jul 01, 2021 6:36 pm this, remove this list
Code: Select all
public List<int> GenerateMoves()
        {
            moves = new List<int>(64);
anytime you do a
Code: Select all
= new
something, remove it or find a way to run it less often

suggestion, go get the 5 day trial of dotmemory, see the flat line? this should be your goal, you want the memory to be dead, not alive

this is a perft 5
https://i.imgur.com/GRyhtH7.png

I downloaded that program, and it was the issue with the list that you have mentioned

in the memory snapshot it does point to the list

that was the good news, the bad news are that keeping one array instance for the perft on search it does break the tree :/ even if i return it in a method.

so i have an idea that having an array of arrays (a jagged array) that gets indexed by the ply, in theory the GenerateMoves function only it's called once each ply (even in QS), so it should be enough to solve the problem, isn't it?

something like:

Code: Select all

// [ply][moveIndex]
private int[][] movements_stack;

// and inside the search
board.GenerateMoves(movements_stack[ply]);

for (byte moveIndex = 0; moveIndex < movements_stack[ply].Lenght; moveindex++)
{
// negamax loop

emadsen · Post by **emadsen** » Fri Jul 02, 2021 1:04 am

pedrojdm2021 wrote: ↑Thu Jul 01, 2021 6:01 am
a) memory allocations your engine does in search
i don't know if i am doing that... each node i get a new list<int> instance from the moveGenerator,

Which is it?

You are in serious trouble if you don't know when or where you're allocating memory. If you don't know where you allocate memory on the heap (reference types), you're unprepared to troubleshoot performance issues. If you "new up" class instances inside tight loops (like a chess engine's search method), then allow those instances to fall out of scope (or be overwritten by newer instances), you're putting extreme pressure on the garbage collector to sweep and compact memory.

Code: Select all

TTable.Store(new TTEntry(TTFlag.BETA, board.hash, _depth , _beta), ply);
moves = new List<int>(64);

Ugh. Memory allocation inside search.

Let's break it down: Assume a bitboard chess engine is capable of generating moves at 50 million NPS (incremented at Board.PlayMove). Assume the average middlegame position contains 30 legal moves. Searching the tree of potential continuations involves generating 50 million / 30 = 1.67 million move lists per second. It is far more efficient to write into a pre-allocated array of moves (preferably encoded as a primitive type like uint or ulong) than it is to allocate and then garbage-collect (GC) 1.67 million List<ulong>. The situation is worse than that: Chess engines are selective searchers. They don't search all 30 moves of the average middlegame position. The alpha / beta algorithm causes beta cutoffs much earlier than move 30- ideally by move 2 or less on average. So an engine generates (read "pays a cost in decreased performance") many moves it never searches. Even worse: The .NET implementation of List<T> allocates a small array initially (length = 16, I think?). If you add an item beyond the last index, it allocates a new array of 2x size and copies the existing 16 items into it. The GC will determine the original array no longer is referenced and will compact its memory in the heap.

In a business application with no hot-path that repeats operations millions of times, this has little, practical consequence. Business developers are more focused on flexible software designs and application lifecycle management (DI, SOA, unit testing, dynamic config, DevOps, and the like) than performance.

In a chess engine, it's disastrous. All of this memory churn (allocate and GC) is happening inside a recursive, tight loop (the search method). If done inefficiently, it will kill performance. Search always will be slower than perft (counting legal moves) because of "bookkeeping" it must do to order, reduce, and prune moves + the cost of statically evaluating inner and leaf positions. However, if search is decreasing move generation speed from 50 million NPS to 100,000 NPS, something is seriously wrong with your implementation. My guess is too much "new", causing too many expensive GC sweeps.

Pre-allocate at engine startup everything used by search. Banish the "new" keyword from search or any method called by search.

i don't know clearly how to avoid declaration of that stuff inside the methods that they are in use

It sounds like you need to break your habit of allocating locally and letting class instances fall out of scope. You need to break away from your "let the garbage collector" handle it mentality. To unlock improved performance, you need to solve the problem of how to pre-allocate at engine startup and pass references (and index positions) to the search and eval methods.

pedrojdm2021 · Post by **pedrojdm2021** » Fri Jul 02, 2021 1:43 am

emadsen wrote: ↑Fri Jul 02, 2021 1:04 am
pedrojdm2021 wrote: ↑Thu Jul 01, 2021 6:01 am
a) memory allocations your engine does in search
i don't know if i am doing that... each node i get a new list<int> instance from the moveGenerator,
Which is it?

You are in serious trouble if you don't know when or where you're allocating memory. If you don't know where you allocate memory on the heap (reference types), you're unprepared to troubleshoot performance issues. If you "new up" class instances inside tight loops (like a chess engine's search method), then allow those instances to fall out of scope (or be overwritten by newer instances), you're putting extreme pressure on the garbage collector to sweep and compact memory.
Code: Select all
TTable.Store(new TTEntry(TTFlag.BETA, board.hash, _depth , _beta), ply);
moves = new List<int>(64);
Ugh. Memory allocation inside search.

Let's break it down: Assume a bitboard chess engine is capable of generating moves at 50 million NPS (incremented at Board.PlayMove). Assume the average middlegame position contains 30 legal moves. Searching the tree of potential continuations involves generating 50 million / 30 = 1.67 million move lists per second. It is far more efficient to write into a pre-allocated array of moves (preferably encoded as a primitive type like uint or ulong) than it is to allocate and then garbage-collect (GC) 1.67 million List<ulong>. The situation is worse than that: Chess engines are selective searchers. They don't search all 30 moves of the average middlegame position. The alpha / beta algorithm causes beta cutoffs much earlier than move 30- ideally by move 2 or less on average. So an engine generates (read "pays a cost in decreased performance") many moves it never searches. Even worse: The .NET implementation of List<T> allocates a small array initially (length = 16, I think?). If you add an item beyond the last index, it allocates a new array of 2x size and copies the existing 16 items into it. The GC will determine the original array no longer is referenced and will compact its memory in the heap.

In a business application with no hot-path that repeats operations millions of times, this has little, practical consequence. Business developers are more focused on flexible software designs and application lifecycle management (DI, SOA, unit testing, dynamic config, DevOps, and the like) than performance.

In a chess engine, it's disastrous. All of this memory churn (allocate and GC) is happening inside a recursive, tight loop (the search method). If done inefficiently, it will kill performance. Search always will be slower than perft (counting legal moves) because of "bookkeeping" it must do to order, reduce, and prune moves + the cost of statically evaluating inner and leaf positions. However, if search is decreasing move generation speed from 50 million NPS to 100,000 NPS, something is seriously wrong with your implementation. My guess is too much "new", causing too many expensive GC sweeps.

Pre-allocate at engine startup everything used by search. Banish the "new" keyword from search or any method called by search.

i don't know clearly how to avoid declaration of that stuff inside the methods that they are in use
It sounds like you need to break your habit of allocating locally and letting class instances fall out of scope. You need to break away from your "let the garbage collector" handle it mentality. To unlock improved performance, you need to solve the problem of how to pre-allocate at engine startup and pass references (and index positions) to the search and eval methods.

I understand, thank you for your detailed explaination on this

Yeah i have found that the issue was too many instance creations inside the search, the thing was that visual studio never pointed at these things, but Jetbrain's DotMemory does. I have some ideas in mind on how to avoid that, i'll post the results when i finish the changes

How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?

Re: How can i improve my C# engine speed?