If you look at my last post, you have a solution that should speed up your isincheck function so it will take approximately 2/3 of the time I guess.pedrojdm2021 wrote: ↑Sun Jul 11, 2021 10:16 pmYeah but you use .net 5 that's the problem, any fast C# engine that i've seen is written in .net 5 in .net 4 we dont have access to the new "native" stuff in .net:emadsen wrote: ↑Sun Jul 11, 2021 9:59 pmYou are jumping to conclusions way too quickly. C# and .NET is not to blame for your engine performing at 1/10th the speed it should. My engine is written in C# using magic bitboards. It searches 3.7 million NPS in the WAC test suite, counting nodes only in the Board.PlayMove method.pedrojdm2021 wrote: ↑Sun Jul 11, 2021 8:43 pmMy conclusion on this is that the possibly main cause is the slowness of C# in . net framework 4.
In the source code of my C# engine I define #if POPCOUNT to a) fallback to loops (via Value &= Value - 1ul) to count set bits and b) use a De Bruijn sequence to find the first set bit for older processors that lack the popcount instruction. I haven't done extensive testing. But in a few positions I examined, the popcount version was 1.25x faster than the non-popcount version. A 20% slowdown due to missing support for popcount cannot explain why your engine is an order of magnitude (1/10) slower than it could be. It's not due to techniques used to count and locate bits.
In addition, I know Graham Banks uses the non-popcount version of MadChess when running tournaments. It's unlikely all CCRL testers use the non-popcount version of my engine (each tester has their own hardware), however, CCRL has rated MadChess at 2636 ELO. This aligns with my own testing on my home, popcount-capble, PC (2638 ELO). My point being, yes MadChess runs slower on non-popcount hardware, but so do most other engines. I haven't measure any detrimental effect of non-popcount hardware on my engine's playing strength.
Just to name a few, in .net 4 we don't have:
- Span<T>
- System.Runtime.Intrinsics.X86;
i have to stick with normal arrays and without any "fancy" .net 5 feature that microsoft would have added in that version
In .net 4 bitwise operations seems to be extremly expensive:
a simple least significant index from some bitwise operations represents a 6%?
i have tried many things and everything from visual studio points to the same: the bitwise operations and the "Square attacked by" (to detect checks) that i have implemented it as it is documented in the chess programming wiki
Sadly i can't use .net 5 because the framework that i am using to build the game itself (unity engine) does not support .net 5
Efficiently Generated Legal moves question
Moderators: hgm, Rebel, chrisw
-
- Posts: 334
- Joined: Sat Feb 25, 2012 10:42 pm
- Location: Stockholm
Re: Efficiently Generated Legal moves question
-
- Posts: 95
- Joined: Fri Nov 09, 2012 12:36 am
-
- Posts: 157
- Joined: Fri Apr 30, 2021 7:19 am
- Full name: Pedro Duran
Re: Efficiently Generated Legal moves question
yeah i am going to apply that, and only generate legal movesPio wrote: ↑Sun Jul 11, 2021 10:20 pmIf you look at my last post, you have a solution that should speed up your isincheck function so it will take approximately 2/3 of the time I guess.pedrojdm2021 wrote: ↑Sun Jul 11, 2021 10:16 pmYeah but you use .net 5 that's the problem, any fast C# engine that i've seen is written in .net 5 in .net 4 we dont have access to the new "native" stuff in .net:emadsen wrote: ↑Sun Jul 11, 2021 9:59 pmYou are jumping to conclusions way too quickly. C# and .NET is not to blame for your engine performing at 1/10th the speed it should. My engine is written in C# using magic bitboards. It searches 3.7 million NPS in the WAC test suite, counting nodes only in the Board.PlayMove method.pedrojdm2021 wrote: ↑Sun Jul 11, 2021 8:43 pmMy conclusion on this is that the possibly main cause is the slowness of C# in . net framework 4.
In the source code of my C# engine I define #if POPCOUNT to a) fallback to loops (via Value &= Value - 1ul) to count set bits and b) use a De Bruijn sequence to find the first set bit for older processors that lack the popcount instruction. I haven't done extensive testing. But in a few positions I examined, the popcount version was 1.25x faster than the non-popcount version. A 20% slowdown due to missing support for popcount cannot explain why your engine is an order of magnitude (1/10) slower than it could be. It's not due to techniques used to count and locate bits.
In addition, I know Graham Banks uses the non-popcount version of MadChess when running tournaments. It's unlikely all CCRL testers use the non-popcount version of my engine (each tester has their own hardware), however, CCRL has rated MadChess at 2636 ELO. This aligns with my own testing on my home, popcount-capble, PC (2638 ELO). My point being, yes MadChess runs slower on non-popcount hardware, but so do most other engines. I haven't measure any detrimental effect of non-popcount hardware on my engine's playing strength.
Just to name a few, in .net 4 we don't have:
- Span<T>
- System.Runtime.Intrinsics.X86;
i have to stick with normal arrays and without any "fancy" .net 5 feature that microsoft would have added in that version
In .net 4 bitwise operations seems to be extremly expensive:
a simple least significant index from some bitwise operations represents a 6%?
i have tried many things and everything from visual studio points to the same: the bitwise operations and the "Square attacked by" (to detect checks) that i have implemented it as it is documented in the chess programming wiki
Sadly i can't use .net 5 because the framework that i am using to build the game itself (unity engine) does not support .net 5
-
- Posts: 157
- Joined: Fri Apr 30, 2021 7:19 am
- Full name: Pedro Duran
Re: Efficiently Generated Legal moves question
now i wish that i would have implemented the engine as an x88 engine but i wanted something fancy and now here i am
-
- Posts: 434
- Joined: Thu Apr 26, 2012 1:51 am
- Location: Oak Park, IL, USA
- Full name: Erik Madsen
Re: Efficiently Generated Legal moves question
OK Pedro, you know best. Yet you seem to have missed the entire point of my post. Span<T> and System.Runtime.Intrinsics are not magic bullets that get you a 10x speed boost. I assume you're new to chess programming. You've been asking questions on this forum for what, two weeks? Now you're tossing away- as too slow- a programming language and runtime (or at least version 4 of the .NET runtime) that's supported by 20+ years of engineering effort from a tech pioneering company. Doesn't that strike you as a bit rash and out of proportion?pedrojdm2021 wrote: ↑Sun Jul 11, 2021 10:16 pm Yeah but you use .net 5 that's the problem, any fast C# engine that i've seen is written in .net 5 in .net 4 we dont have access to the new "native" stuff in .net:
Just to name a few, in .net 4 we don't have:
- Span<T>
- System.Runtime.Intrinsics.X86;
You're relying too much on profiler CPU metrics. Time In Method percentages will not indicate fundamental flaws in data structures, memory allocation, or software algorithms. TIM metrics are much more useful when you have recorded a baseline performance profile, you refactor or enhance code, then you observe a performance regression (slowdown). You can compare the TIM metrics from the previous version to the new version to determine if the performance of specific methods has degraded. If so, those methods are culprits. Code changes within them (or the methods underneath them on the call stack) may have wrecked performance.
You can't just look at TIM percentages and declare a method with a high CPU% indicates a flaw in the underlying programming language or runtime. For example, if you profile a strong chess engine, you're likely to find high CPU% for the quiescence search and evaluation methods. Should we eliminate QSearch & Eval to speed up the engine? Of course not. So the high CPU% metric for those methods does not indicate a performance problem.
The absolute value of program run time (or time-to-depth in a chess engine search) is the metric that matters. Program run time is overwhelmingly influenced by proper choice of data structures, by light memory management (avoiding churn via excessive allocation and GC), by efficient algorithms, and by bug elimination (a never-ending quest). Even the run time and time-to-depth metrics are fraught with subtlety and require context to understand. You can make a chess engine search very fast but very stupidly by careless reducing and pruning too many moves.
It is entirely possible that a code refactor that better aligns data structures to the data access patterns of a chess engine, or reduces memory allocations, or improves search efficiency (spending more time on critical positions and less time on unprofitable continuations) actually increases one of the method CPU percentages you're fixated on, yet significantly reduces the absolute value of program run time (or time-to-depth), thereby increasing the playing strength of the engine.
I'm repeating myself, but for emphasis: High CPU percentages are relevant when the same method's CPU% was low in a previous baseline performance profile and increased significantly in a new version. The metric provides a lead to follow when investigating performance degradations introduced by newly commited code. The metric rarely provides much value when interpreted a) in isolation or b) when compared to the same metric from a vastly different codebase.
My C# chess engine: https://www.madchess.net
-
- Posts: 95
- Joined: Fri Nov 09, 2012 12:36 am
Re: Efficiently Generated Legal moves question
i totally agree with this, look at the implementation, see if you can optimize it, refactor, refactor, refactor, ... ok i hope you understandemadsen wrote: ↑Mon Jul 12, 2021 12:05 amOK Pedro, you know best. Yet you seem to have missed the entire point of my post. Span<T> and System.Runtime.Intrinsics are not magic bullets that get you a 10x speed boost. I assume you're new to chess programming. You've been asking questions on this forum for what, two weeks? Now you're tossing away- as too slow- a programming language and runtime (or at least version 4 of the .NET runtime) that's supported by 20+ years of engineering effort from a tech pioneering company. Doesn't that strike you as a bit rash and out of proportion?pedrojdm2021 wrote: ↑Sun Jul 11, 2021 10:16 pm Yeah but you use .net 5 that's the problem, any fast C# engine that i've seen is written in .net 5 in .net 4 we dont have access to the new "native" stuff in .net:
Just to name a few, in .net 4 we don't have:
- Span<T>
- System.Runtime.Intrinsics.X86;
You're relying too much on profiler CPU metrics. Time In Method percentages will not indicate fundamental flaws in data structures, memory allocation, or software algorithms. TIM metrics are much more useful when you have recorded a baseline performance profile, you refactor or enhance code, then you observe a performance regression (slowdown). You can compare the TIM metrics from the previous version to the new version to determine if the performance of specific methods has degraded. If so, those methods are culprits. Code changes within them (or the methods underneath them on the call stack) may have wrecked performance.
You can't just look at TIM percentages and declare a method with a high CPU% indicates a flaw in the underlying programming language or runtime. For example, if you profile a strong chess engine, you're likely to find high CPU% for the quiescence search and evaluation methods. Should we eliminate QSearch & Eval to speed up the engine? Of course not. So the high CPU% metric for those methods does not indicate a performance problem.
The absolute value of program run time (or time-to-depth in a chess engine search) is the metric that matters. Program run time is overwhelmingly influenced by proper choice of data structures, by light memory management (avoiding churn via excessive allocation and GC), by efficient algorithms, and by bug elimination (a never-ending quest). Even the run time and time-to-depth metrics are fraught with subtlety and require context to understand. You can make a chess engine search very fast but very stupidly by careless reducing and pruning too many moves.
It is entirely possible that a code refactor that better aligns data structures to the data access patterns of a chess engine, or reduces memory allocations, or improves search efficiency (spending more time on critical positions and less time on unprofitable continuations) actually increases one of the method CPU percentages you're fixated on, yet significantly reduces the absolute value of program run time (or time-to-depth), thereby increasing the playing strength of the engine.
I'm repeating myself, but for emphasis: High CPU percentages are relevant when the same method's CPU% was low in a previous baseline performance profile and increased significantly in a new version. The metric provides a lead to follow when investigating performance degradations introduced by newly commited code. The metric rarely provides much value when interpreted a) in isolation or b) when compared to the same metric from a vastly different codebase.
also, in one other thread i suggested that you try the trial version of dottrace instead of the build-in visual studio one, if you didnt, try. i personally don't like the built-in one, trial is free, go play with it. try tracing profiling, not sampling. this could help you figure out if some method are called too often, untick allow inlining.
i'm going to link this, post by me a few months ago http://talkchess.com/forum3/viewtopic.p ... 10#p886029
focus on the screenshot and feel free to look at the whole thread
in there i'm saying that I went from 3.5m to 21m in my perft because of an error in my code
today the same perft is now doing between 35m and 36m, what did I do? some pretty minor refactoring
(i should stop spending time on x88 and start the damn bitboard haha)
-
- Posts: 95
- Joined: Fri Nov 09, 2012 12:36 am
Re: Efficiently Generated Legal moves question
also, I hope you are running your actual test under release mode and not debug mode
-
- Posts: 157
- Joined: Fri Apr 30, 2021 7:19 am
- Full name: Pedro Duran
Re: Efficiently Generated Legal moves question
yes i have dowloaded the dotTrace and everything seems to be pointing more or less to the same code...spirch wrote: ↑Mon Jul 12, 2021 12:50 ami totally agree with this, look at the implementation, see if you can optimize it, refactor, refactor, refactor, ... ok i hope you understandemadsen wrote: ↑Mon Jul 12, 2021 12:05 amOK Pedro, you know best. Yet you seem to have missed the entire point of my post. Span<T> and System.Runtime.Intrinsics are not magic bullets that get you a 10x speed boost. I assume you're new to chess programming. You've been asking questions on this forum for what, two weeks? Now you're tossing away- as too slow- a programming language and runtime (or at least version 4 of the .NET runtime) that's supported by 20+ years of engineering effort from a tech pioneering company. Doesn't that strike you as a bit rash and out of proportion?pedrojdm2021 wrote: ↑Sun Jul 11, 2021 10:16 pm Yeah but you use .net 5 that's the problem, any fast C# engine that i've seen is written in .net 5 in .net 4 we dont have access to the new "native" stuff in .net:
Just to name a few, in .net 4 we don't have:
- Span<T>
- System.Runtime.Intrinsics.X86;
You're relying too much on profiler CPU metrics. Time In Method percentages will not indicate fundamental flaws in data structures, memory allocation, or software algorithms. TIM metrics are much more useful when you have recorded a baseline performance profile, you refactor or enhance code, then you observe a performance regression (slowdown). You can compare the TIM metrics from the previous version to the new version to determine if the performance of specific methods has degraded. If so, those methods are culprits. Code changes within them (or the methods underneath them on the call stack) may have wrecked performance.
You can't just look at TIM percentages and declare a method with a high CPU% indicates a flaw in the underlying programming language or runtime. For example, if you profile a strong chess engine, you're likely to find high CPU% for the quiescence search and evaluation methods. Should we eliminate QSearch & Eval to speed up the engine? Of course not. So the high CPU% metric for those methods does not indicate a performance problem.
The absolute value of program run time (or time-to-depth in a chess engine search) is the metric that matters. Program run time is overwhelmingly influenced by proper choice of data structures, by light memory management (avoiding churn via excessive allocation and GC), by efficient algorithms, and by bug elimination (a never-ending quest). Even the run time and time-to-depth metrics are fraught with subtlety and require context to understand. You can make a chess engine search very fast but very stupidly by careless reducing and pruning too many moves.
It is entirely possible that a code refactor that better aligns data structures to the data access patterns of a chess engine, or reduces memory allocations, or improves search efficiency (spending more time on critical positions and less time on unprofitable continuations) actually increases one of the method CPU percentages you're fixated on, yet significantly reduces the absolute value of program run time (or time-to-depth), thereby increasing the playing strength of the engine.
I'm repeating myself, but for emphasis: High CPU percentages are relevant when the same method's CPU% was low in a previous baseline performance profile and increased significantly in a new version. The metric provides a lead to follow when investigating performance degradations introduced by newly commited code. The metric rarely provides much value when interpreted a) in isolation or b) when compared to the same metric from a vastly different codebase.
also, in one other thread i suggested that you try the trial version of dottrace instead of the build-in visual studio one, if you didnt, try. i personally don't like the built-in one, trial is free, go play with it. try tracing profiling, not sampling. this could help you figure out if some method are called too often, untick allow inlining.
i'm going to link this, post by me a few months ago http://talkchess.com/forum3/viewtopic.p ... 10#p886029
focus on the screenshot and feel free to look at the whole thread
in there i'm saying that I went from 3.5m to 21m in my perft because of an error in my code
today the same perft is now doing between 35m and 36m, what did I do? some pretty minor refactoring
(i should stop spending time on x88 and start the damn bitboard haha)
It points to the same IsSquareAttacked that seems heavy, inside that it points to:
5% on this GetBitboard, i thought exacting methods as this, when inlining should not cause any overload on the cpu but it seems that it does, or the inlining is not being applied at all?
Code: Select all
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public ulong GetBitboard(byte _player , byte _pieceType)
{
return bitboards[_player] & bitboards[_pieceType];
}
Code: Select all
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ulong GetQueenMoves(byte _square , ulong _occupancy)
{
return GetBishopMoves(_square , _occupancy) | GetRookMoves(_square , _occupancy);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ulong GetBishopMoves(byte _square , ulong _occupancy)
{
_occupancy &= BISHOP_MASKS[_square];
_occupancy *= Magics.BISHOP_MAGICS[_square];
_occupancy >>= (64 - BISHOP_MOVE_COUNT[_square]);
// magic bitboard lookup (square)(magic index)
return BISHOP_MOVES[_square][_occupancy];
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ulong GetRookMoves(byte _square, ulong _occupancy)
{
_occupancy &= ROOK_MASKS[_square];
........
so all the profiling work that i've put here is just pointing to the same, but i don't see anything wrong with it
i am using bitboards, i am only returning a value from some arrays and i apply some bitwise operations to get the attacks, i am using inlining so it shlould not cause any issue combining the value of bishop and rook to get queens
-
- Posts: 881
- Joined: Sun Dec 27, 2020 2:40 am
- Location: Bremen, Germany
- Full name: Thomas Jahn
Re: Efficiently Generated Legal moves question
Keep in mind that Pedro is using Unity. Unity has C# scripting but that doesn't mean that the performance characteristics of code written for Unity is really comparable to the C# written for the CLR (the runtime everyone else uses). In Unity you have two choices: Either the C# code get's compiled to IL code and then the IL code get's converted to C++ and finally compiled to a native, platform specific binary. This is called IL2CPP and this is potentially fast but not exactly comparable to IL code being run and JIT-compiled in a high performant, managed runtime like today's CLR.
The alternative to using IL2CPP is to use Mono. They call their fork MonoBleedingEdge which is very funny considering that Mono in itself isn't exactly "bleeding edge" but can be almost considered obsolete since Microsoft made .Net open source. And their Mono fork is also about two years behind the latest upstream code. So it's bleeding-edge in a very painful, sarcastic way, maybe. The biggest issue with the Mono scripting backend is that the garbage collector is just bad. A managed language relies on a runtime that does some decent automatic memory management and Unity struggles with providing that.
And Pedro, tbh I'm not even sure how you're using the Microsoft Visual Studio Profiler effectively to profile a Unity application? Or is your chess code a standalone library that you're just planning to use within Unity? But you develop (and profile) it outside of Unity and run it instead in Microsoft's CLR when your profile it?
The alternative to using IL2CPP is to use Mono. They call their fork MonoBleedingEdge which is very funny considering that Mono in itself isn't exactly "bleeding edge" but can be almost considered obsolete since Microsoft made .Net open source. And their Mono fork is also about two years behind the latest upstream code. So it's bleeding-edge in a very painful, sarcastic way, maybe. The biggest issue with the Mono scripting backend is that the garbage collector is just bad. A managed language relies on a runtime that does some decent automatic memory management and Unity struggles with providing that.
And Pedro, tbh I'm not even sure how you're using the Microsoft Visual Studio Profiler effectively to profile a Unity application? Or is your chess code a standalone library that you're just planning to use within Unity? But you develop (and profile) it outside of Unity and run it instead in Microsoft's CLR when your profile it?
-
- Posts: 157
- Joined: Fri Apr 30, 2021 7:19 am
- Full name: Pedro Duran
Re: Efficiently Generated Legal moves question
I have 2 "projects"lithander wrote: ↑Mon Jul 12, 2021 1:22 am Keep in mind that Pedro is using Unity. Unity has C# scripting but that doesn't mean that the performance characteristics of code written for Unity is really comparable to the C# written for the CLR (the runtime everyone else uses). In Unity you have two choices: Either the C# code get's compiled to IL code and then the IL code get's converted to C++ and finally compiled to a native, platform specific binary. This is called IL2CPP and this is potentially fast but not exactly comparable to IL code being run and JIT-compiled in a high performant, managed runtime like today's CLR.
The alternative to using IL2CPP is to use Mono. They call their fork MonoBleedingEdge which is very funny considering that Mono in itself isn't exactly "bleeding edge" but can be almost considered obsolete since Microsoft made .Net open source. And their Mono fork is also about two years behind the latest upstream code. So it's bleeding-edge in a very painful, sarcastic way, maybe. The biggest issue with the Mono scripting backend is that the garbage collector is just bad. A managed language relies on a runtime that does some decent automatic memory management and Unity struggles with providing that.
And Pedro, tbh I'm not even sure how you're using the Microsoft Visual Studio Profiler effectively to profile a Unity application? Or is your chess code a standalone library that you're just planning to use within Unity? But you develop (and profile) it outside of it using the Microsoft CLR?
look at this as a fronted-backed like structure of a website:
fronted: unity implementation
backend: C# Chess Engine .dll built in .net framework 4.8 (the highest version of .net that unity currently supports)
in the C# chess engine i have for sure a console application to profile, debug, and test the engine before i import it into unity as a .dll plugin
In unity things can get more slower due to unity's implementation of the C# layer, it is weird but the C# in unity always runs slower, no matter if you allocate GC or not