How can i improve my C# engine speed?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

spirch
Posts: 95
Joined: Fri Nov 09, 2012 12:36 am

Re: How can i improve my C# engine speed?

Post by spirch »

pedrojdm2021 wrote: Fri Jul 02, 2021 1:43 am Yeah i have found that the issue was too many instance creations inside the search, the thing was that visual studio never pointed at these things, but Jetbrain's DotMemory does. I have some ideas in mind on how to avoid that, i'll post the results when i finish the changes :D
since in are using trial version, try out the dottrace too

i find it better than the built-in profiler of visual studio
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

spirch wrote: Fri Jul 02, 2021 2:21 am
pedrojdm2021 wrote: Fri Jul 02, 2021 1:43 am Yeah i have found that the issue was too many instance creations inside the search, the thing was that visual studio never pointed at these things, but Jetbrain's DotMemory does. I have some ideas in mind on how to avoid that, i'll post the results when i finish the changes :D
since in are using trial version, try out the dottrace too

i find it better than the built-in profiler of visual studio
i thing that you understood me wrong, Jetbrain's DotMemory it worked and pointed me to the issue, while visual studio don't. :lol:
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

Ok guys i have removed all the allocation on the fly during the search and during the movement generator, but i still having similar results in terms of performance inside the AI :|

i even tried with just a plain alpha-beta:

Code: Select all

private int NegamaxMinimal(int _alpha , int _beta, int _depth)
        {
            if (_depth <= 0) return Evaluate();

            searched_nodes++;

            sbyte targetIndex = board.GenerateMoves(moves_list[ply]);
            int score = 0;

            for(byte moveIndex = 0; moveIndex <= targetIndex; moveIndex++)
            {
                if (board.MakeMove(moves_list[ply][moveIndex]))
                {
                    ply++;
                    score = -NegamaxMinimal(-_beta, -_alpha, _depth -1);
                    ply--;
                    
                    board.UnMakeMove(moves_list[ply][moveIndex]);

                    if (score >= _beta ) return _beta;
                    if (score > _alpha)
                    {
                        pv[ply] = moves_list[ply][moveIndex];
                        _alpha = score;
                    }
                }
            }

            return _alpha;
        }
and the results are very similar in performance (300.000 positions in 3-4 seconds)

visual studio points to this function call inside the evaluate to have heavy cpu impact:

Image

the function implementation:

Code: Select all


 [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static byte GetLS1BIndex(ulong _bitboard)
        {
            return CountBits((_bitboard & (0 -_bitboard)) - 1);
        }

[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static byte CountBits(ulong _bitboard)
        {
            const ulong c1 = 0x_55555555_55555555ul;
            const ulong c2 = 0x_33333333_33333333ul;
            const ulong c3 = 0x_0F0F0F0F_0F0F0F0Ful;
            const ulong c4 = 0x_01010101_01010101ul;
            
            _bitboard -= (_bitboard >> 1) & c1;
            _bitboard = (_bitboard & c2) + ((_bitboard >> 2) & c2);
            _bitboard = (((_bitboard + (_bitboard >> 4)) & c3) * c4) >> 56;
            return (byte)_bitboard;
        }
that is the fastest solution of doing that that i've came for now... i know that in .net 5 there is an API based on cpu architecture to do that inside a native way, but i can't use .net 5 since unity does not support that i have to stick with unity 4.7

any ideas? :/
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: How can i improve my C# engine speed?

Post by lithander »

In a modern version of Unity you could probably use the Burst compiler that provides hardware intrinsics for popcount. But in Unity 4.7 I have no idea...
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

lithander wrote: Fri Jul 02, 2021 6:21 pm In a modern version of Unity you could probably use the Burst compiler that provides hardware intrinsics for popcount. But in Unity 4.7 I have no idea...
sorry i wrote it bad, i was talking about . Net framework 4.7 not unity 4.7 :lol:


I am talking about this:
https://docs.unity3d.com/2020.3/Documen ... pport.html

Unity currently only supports .Net standard 2.0 and . Net framework up to 4.x
User avatar
lithander
Posts: 881
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: How can i improve my C# engine speed?

Post by lithander »

Then try this:

Code: Select all

            bool isSupported = Unity.Burst.Intrinsics.X86.Popcnt.IsPopcntSupported);
            long numberOfSetBits = Unity.Burst.Intrinsics.X86.Popcnt.popcnt_u64(1337); //should return 6
The IsPopcntSupported may return false unless you use AOT and the Burst compiler. But there should be a fallback implemented and maybe the fallback is still faster than your custom code executed by Mono. Good luck! :)
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: How can i improve my C# engine speed?

Post by Gerd Isenberg »

pedrojdm2021 wrote: Fri Jul 02, 2021 9:58 am visual studio points to this function call inside the evaluate to have heavy cpu impact:

the function implementation:

Code: Select all


 [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static byte GetLS1BIndex(ulong _bitboard)
        {
            return CountBits((_bitboard & (0 -_bitboard)) - 1);
        }

[MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static byte CountBits(ulong _bitboard)
        {
            const ulong c1 = 0x_55555555_55555555ul;
            const ulong c2 = 0x_33333333_33333333ul;
            const ulong c3 = 0x_0F0F0F0F_0F0F0F0Ful;
            const ulong c4 = 0x_01010101_01010101ul;
            
            _bitboard -= (_bitboard >> 1) & c1;
            _bitboard = (_bitboard & c2) + ((_bitboard >> 2) & c2);
            _bitboard = (((_bitboard + (_bitboard >> 4)) & c3) * c4) >> 56;
            return (byte)_bitboard;
        }
that is the fastest solution of doing that that i've came for now... i know that in .net 5 there is an API based on cpu architecture to do that inside a native way, but i can't use .net 5 since unity does not support that i have to stick with unity 4.7

any ideas? :/
For GetLS1BIndex better use De Bruijn Multiplication rather than the CountBits version.
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

lithander wrote: Fri Jul 02, 2021 7:09 pm Then try this:

Code: Select all

            bool isSupported = Unity.Burst.Intrinsics.X86.Popcnt.IsPopcntSupported);
            long numberOfSetBits = Unity.Burst.Intrinsics.X86.Popcnt.popcnt_u64(1337); //should return 6
The IsPopcntSupported may return false unless you use AOT and the Burst compiler. But there should be a fallback implemented and maybe the fallback is still faster than your custom code executed by Mono. Good luck! :)
Omg thank you, i didn't realized that unity had their own implementation for popcount :D now i only have to reference unity brush dll into my custom engine dll
pedrojdm2021
Posts: 157
Joined: Fri Apr 30, 2021 7:19 am
Full name: Pedro Duran

Re: How can i improve my C# engine speed?

Post by pedrojdm2021 »

ok i think that i've found the problem

the problem was mainly inside the Quiescence search:

in the old implementation of Quiescence search i was generating ALL moves (and validating the captures inside an "flag" as an optional parameter into the make_move function ), and inside the for loop i was calling the sort_moves function, so in other words: it was sorting all moves, not only captures as expected, it was generating an intense waste of computation time.

So now i changed the generate moves function to now generate only captures, and now my engine search for the best move in kiwipete position in only 3.6 seconds on the depth 9! :D i compared the speed with the maksimKorzh's BBC 1.2 and mine is only 0.6 seconds slower than his implementation. so it's great news :)

thank you guys for all the help provided, it really helped me to find the real problem and reach the solution :D