I mentioned that in my post, didn't I?
m8 Dev log and C# experiments
Moderators: hgm, Rebel, chrisw
-
- Posts: 71
- Joined: Mon Apr 05, 2021 12:00 am
- Full name: Eduardo Caceres
Re: m8 Dev log and C# experiments
-
- Posts: 71
- Joined: Mon Apr 05, 2021 12:00 am
- Full name: Eduardo Caceres
Re: m8 Dev log and C# experiments
It's relatively recent, it was introduced in C#10 (.NET 6) if I'm not mistaken, together with global usings.
https://learn.microsoft.com/en-us/dotne ... l-modifier
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: m8 Dev log and C# experiments
Thanks! But to be clear, I developped multiple chess engines over the last 25 years. One of them MatMoi VII was released as free but closed source software beacause I'm not particullary proud of the code quanlity and it's in french. I will open source my new c# engine.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: m8 Dev log and C# experiments
I have stopped development of my C# chess engines a couples of weeks ago.
I started by developping all the low level structures, like Move, Square, Piece, etc. in a way that would allow me to write expressive and intuitive code. I was afraid this would incure some overhead, so like I reported here before, I benchmark this technique and found that if I inlined all the methods and properties it should not result in a performance penalty. This made my code, IMO, really easy to read and even write. Here is an example from my Make function :
I also learned a lot about how struct and class are stored/copied in memory and how different containers actually worked. I lerned a lot by reading Leoric and Pedantic's code. This was really enjoyable experience I made me like C# even more than before. It is really easier to write clean code in C# than C++.
However... The performance was not great. I implemented the move generation (using black magic bitboard), the make/unmake methods then I wrote a Perft methods. At this point was the first time I could compare this engine to my previous engine written in C++. It was about 4 times slower. Even though my C++ engine also updates the zobrist hash and the piece-square evaluation. At first I though maybe something was wring and particularly slow, but I searched for multiple days, using VS profiler, event looking at the generated IL code at times and even though I found some things to improve, nothing gave me a significant improvement. I was still mostly 4 times slower (12 Mnps vs 40 Mnps on a signle thread on the same computer).
My conclusion for now is that either C# is fondamentally slower than C++ (I know it is, by I did not though it would be 4x slower) or my use of structure as integer wrappers is having a significant overhead that did not show up in my benchmarks. Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.
For now I went back to my C++ engine (which is also named m8, I need to rename one of them), but I hope to comme back to m8# (renamed!) someday I figure out what went wrong or live with the performance penalty, because I was really happy about how the code looked like compared to a C++ engine.
I started by developping all the low level structures, like Move, Square, Piece, etc. in a way that would allow me to write expressive and intuitive code. I was afraid this would incure some overhead, so like I reported here before, I benchmark this technique and found that if I inlined all the methods and properties it should not result in a performance penalty. This made my code, IMO, really easy to read and even write. Here is an example from my Make function :
Code: Select all
case MoveType.CastleKingSide:
Debug.Assert((_castlingOptions & CastlingOptionsHelpers.Create(_sideToMove, CastlingSide.KingSide)) != CastlingOptions.None);
Debug.Assert(move.To.File == File.g);
var rookFrom = new Square(GetCastlingFile(CastlingSide.KingSide), move.From.Rank);
var rookTo = new Square(File.f, move.From.Rank);
MovePiece(move.From, move.To);
MovePiece(rookFrom, rookTo);
_enPassantFile = File.Invalid;
_castlingOptions &= _castlingMasks[move.From.Value];
++_halfMoveClock;
break;
However... The performance was not great. I implemented the move generation (using black magic bitboard), the make/unmake methods then I wrote a Perft methods. At this point was the first time I could compare this engine to my previous engine written in C++. It was about 4 times slower. Even though my C++ engine also updates the zobrist hash and the piece-square evaluation. At first I though maybe something was wring and particularly slow, but I searched for multiple days, using VS profiler, event looking at the generated IL code at times and even though I found some things to improve, nothing gave me a significant improvement. I was still mostly 4 times slower (12 Mnps vs 40 Mnps on a signle thread on the same computer).
My conclusion for now is that either C# is fondamentally slower than C++ (I know it is, by I did not though it would be 4x slower) or my use of structure as integer wrappers is having a significant overhead that did not show up in my benchmarks. Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.
For now I went back to my C++ engine (which is also named m8, I need to rename one of them), but I hope to comme back to m8# (renamed!) someday I figure out what went wrong or live with the performance penalty, because I was really happy about how the code looked like compared to a C++ engine.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 1784
- Joined: Wed Jul 03, 2019 4:42 pm
- Location: Netherlands
- Full name: Marcel Vanthoor
Re: m8 Dev log and C# experiments
It's called the newtype pattern, and it's idiomatic for Rust. The problem though is that you can't reach the inner type directly, so you'll have to implement passthrough functions to reach the functionality you need from the inner type. (In Rust you could also implement the Deref operator, which makes the outer type automatically expose all the functions of the inner type, but that is considered ugly.)
It's on my refactor list for somewhere in the future, because now it's possible to swap things such as square, piece and sides.
-
- Posts: 10
- Joined: Sat Sep 18, 2021 9:36 pm
- Full name: Tony Schwebs
Re: m8 Dev log and C# experiments
You can use the DOTNET_JitDisasm=<Method> environment variable to dump assembly to the console. You should get assembly for a Tier0 unoptimized version and a Tier1 optimized version.
Code: Select all
C:\Users\krunc\source\repos\chess\bin\Release\net8.0>SET DOTNET_JitDisasm=GenerateMoves
C:\Users\krunc\source\repos\chess\bin\Release\net8.0>chess.exe
; Assembly listing for method Chess.MoveGenerator.MoveGen:GenerateMoves(int,ulong,ulong,ulong):this (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data
G_M000_IG01: ;; offset=0x0000
G_M000_IG02: ;; offset=0x0000
and r9, r8
je SHORT G_M000_IG04
align [11 bytes for IG03]
G_M000_IG03: ;; offset=0x0010
add bword ptr [rcx+0x10], -4
xor eax, eax
tzcnt rax, r9
xor eax, edx
mov r10, bword ptr [rcx+0x10]
mov dword ptr [r10], eax
blsr r9, r9
jne SHORT G_M000_IG03
G_M000_IG04: ;; offset=0x002C
and r8, qword ptr [rsp+0x28]
je SHORT G_M000_IG06
align [13 bytes for IG05]
G_M000_IG05: ;; offset=0x0040
xor eax, eax
tzcnt rax, r8
xor eax, edx
mov r10, bword ptr [rcx+0x08]
mov dword ptr [r10], eax
add bword ptr [rcx+0x08], 4
blsr r8, r8
jne SHORT G_M000_IG05
G_M000_IG06: ;; offset=0x005C
ret
; Total bytes of code 93
OK! 153ms, 774380K NPS
OK! 226ms, 853847K NPS
OK! 330ms, 539988K NPS
OK! 892ms, 791358K NPS
OK! 0ms, 576587K NPS
OK! 193ms, 846268K NPS
Total: 1361558651 Nodes, 1797ms, 757443K NPS
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: m8 Dev log and C# experiments
Hi,krunch wrote: ↑Fri Feb 23, 2024 7:10 amYou can use the DOTNET_JitDisasm=<Method> environment variable to dump assembly to the console. You should get assembly for a Tier0 unoptimized version and a Tier1 optimized version.
Code: Select all
C:\Users\krunc\source\repos\chess\bin\Release\net8.0>SET DOTNET_JitDisasm=GenerateMoves C:\Users\krunc\source\repos\chess\bin\Release\net8.0>chess.exe ; Assembly listing for method Chess.MoveGenerator.MoveGen:GenerateMoves(int,ulong,ulong,ulong):this (FullOpts) ; Emitting BLENDED_CODE for X64 with AVX - Windows ; FullOpts code ; optimized code ; rsp based frame ; fully interruptible ; No PGO data ; 0 inlinees with PGO data; 6 single block inlinees; 0 inlinees without PGO data G_M000_IG01: ;; offset=0x0000 G_M000_IG02: ;; offset=0x0000 and r9, r8 je SHORT G_M000_IG04 align [11 bytes for IG03] G_M000_IG03: ;; offset=0x0010 add bword ptr [rcx+0x10], -4 xor eax, eax tzcnt rax, r9 xor eax, edx mov r10, bword ptr [rcx+0x10] mov dword ptr [r10], eax blsr r9, r9 jne SHORT G_M000_IG03 G_M000_IG04: ;; offset=0x002C and r8, qword ptr [rsp+0x28] je SHORT G_M000_IG06 align [13 bytes for IG05] G_M000_IG05: ;; offset=0x0040 xor eax, eax tzcnt rax, r8 xor eax, edx mov r10, bword ptr [rcx+0x08] mov dword ptr [r10], eax add bword ptr [rcx+0x08], 4 blsr r8, r8 jne SHORT G_M000_IG05 G_M000_IG06: ;; offset=0x005C ret ; Total bytes of code 93 OK! 153ms, 774380K NPS OK! 226ms, 853847K NPS OK! 330ms, 539988K NPS OK! 892ms, 791358K NPS OK! 0ms, 576587K NPS OK! 193ms, 846268K NPS Total: 1361558651 Nodes, 1797ms, 757443K NPS
This looks much easier than what I found, I'll give it a try this weekend.
Thanks.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com
-
- Posts: 4
- Joined: Fri Oct 28, 2022 6:33 am
- Full name: Keith Downes
Re: m8 Dev log and C# experiments
This is user error. C# is easy to destroy the speed of software when accidentally copying objects instead of using references, for example, and overusing object oriented features such as interfaces. There are also several gotchas that only experience will teach you, such as writing expressions in a function call (the compiler wont optimize it or the function). I'm getting C# within 10% the speed of c++ in identical ray tracing programs and within 15% with my chess engine(106mnps vs 120). Probably can get it faster in chess if i spent more time on it. Most if my effort is in c++. The first time i ported chess from c# to c++, i was 70mnps in c# and 73mnps in c++, identical programs producing exact same number of nodes in a position.mathmoi wrote: ↑Wed Feb 21, 2024 7:27 pm I have stopped development of my C# chess engines a couples of weeks ago.
I started by developping all the low level structures, like Move, Square, Piece, etc. in a way that would allow me to write expressive and intuitive code. I was afraid this would incure some overhead, so like I reported here before, I benchmark this technique and found that if I inlined all the methods and properties it should not result in a performance penalty. This made my code, IMO, really easy to read and even write. Here is an example from my Make function :
I also learned a lot about how struct and class are stored/copied in memory and how different containers actually worked. I lerned a lot by reading Leoric and Pedantic's code. This was really enjoyable experience I made me like C# even more than before. It is really easier to write clean code in C# than C++.Code: Select all
case MoveType.CastleKingSide: Debug.Assert((_castlingOptions & CastlingOptionsHelpers.Create(_sideToMove, CastlingSide.KingSide)) != CastlingOptions.None); Debug.Assert(move.To.File == File.g); var rookFrom = new Square(GetCastlingFile(CastlingSide.KingSide), move.From.Rank); var rookTo = new Square(File.f, move.From.Rank); MovePiece(move.From, move.To); MovePiece(rookFrom, rookTo); _enPassantFile = File.Invalid; _castlingOptions &= _castlingMasks[move.From.Value]; ++_halfMoveClock; break;
However... The performance was not great. I implemented the move generation (using black magic bitboard), the make/unmake methods then I wrote a Perft methods. At this point was the first time I could compare this engine to my previous engine written in C++. It was about 4 times slower. Even though my C++ engine also updates the zobrist hash and the piece-square evaluation. At first I though maybe something was wring and particularly slow, but I searched for multiple days, using VS profiler, event looking at the generated IL code at times and even though I found some things to improve, nothing gave me a significant improvement. I was still mostly 4 times slower (12 Mnps vs 40 Mnps on a signle thread on the same computer).
My conclusion for now is that either C# is fondamentally slower than C++ (I know it is, by I did not though it would be 4x slower) or my use of structure as integer wrappers is having a significant overhead that did not show up in my benchmarks. Unfortunately I did not figure out how I could se the assembly code generated by the jit computer to see by myself if theses structures are "optimized-out" as I think they should.
For now I went back to my C++ engine (which is also named m8, I need to rename one of them), but I hope to comme back to m8# (renamed!) someday I figure out what went wrong or live with the performance penalty, because I was really happy about how the code looked like compared to a C++ engine.
-
- Posts: 290
- Joined: Mon Mar 13, 2006 5:23 pm
- Location: Québec
Re: m8 Dev log and C# experiments
Hi Iketh,Iketh wrote: ↑Mon May 06, 2024 4:52 am This is user error. C# is easy to destroy the speed of software when accidentally copying objects instead of using references, for example, and overusing object oriented features such as interfaces. There are also several gotchas that only experience will teach you, such as writing expressions in a function call (the compiler wont optimize it or the function). I'm getting C# within 10% the speed of c++ in identical ray tracing programs and within 15% with my chess engine(106mnps vs 120). Probably can get it faster in chess if i spent more time on it. Most if my effort is in c++. The first time i ported chess from c# to c++, i was 70mnps in c# and 73mnps in c++, identical programs producing exact same number of nodes in a position.
Thanks for your answer. You are probably right part of the problem might be caused by my innexperience with optimizing c# code. This being said, I'm an experienced developper and c# is not a new language to me. I did a lots of reseach to understand how value-types and reference-types are created/copied/destroyed.
Since I started this threads I did more research and I found the primary cause of the slowness I observed. The dotnet JIT compiler as a concept called "inliner budget". It's a limit on how much effort will be made to inline methods, once this budget is exhausted, no more methods are inlined even small one liner that are decorated with the aggressiveInlining attribute. In my case, since I heavilly relly on hundreds of theses small method, the inliner budget is eventually exhausted and simple expressions like "var x = move.From.Column.Value", become three non inlined method called, killing the performance. I confirmed this by examining the code generated by the JIT compiler. Method that should obviously be inlined were not.
I did not find a solution to this, but maybe in a future version of .net this might be improved. There is an opened issue here about this : https://github.com/dotnet/runtime/issues/93069
In the mean time I work on my C++ engine (m8), but I would really like to go back to m8#.
Thanks.
Mathieu Pagé
mathieu@mathieupage.com
mathieu@mathieupage.com