Why C++ instead of C#?

mvanthoor · Post by **mvanthoor** » Mon Sep 20, 2021 6:05 pm

I looked into the C-version of QBB-Perft, and decided that I don't want to spend the time to port this to Rust. Rust can do everything QBB-Perft does, including the macro's, using macrorules!, procedural macro's, or even const functions.

I don't want to spend the time converting all that, just to end up with a program that is effectively compiled by LLVM, just as when I'd compile the C-version with Clang. Rest assured that if someone ports this and does it properly, it'll be exactly as fast as the C-version, give or take 1-2% owing to compiler differences. I can do the conversion faster when just using inline functions instead of the macro's, but in that case, the C-version will have a big advantage because it pre-computes lots of data during the compile step.

To be honest, I'd rather spend the free time I do have on refactoring my engine's code in preparation for the next version.

lithander · Post by **lithander** » Mon Sep 20, 2021 6:36 pm

R. Tomasi wrote: ↑Mon Sep 20, 2021 3:57 pm
klx wrote: ↑Mon Sep 20, 2021 3:37 pm
lithander wrote: ↑Mon Sep 20, 2021 8:45 am But I can't see where you think it does contradict he performance logs?
Because you wrote:

C# 28132ms, 48M NPS
C 16837ms, 70M NPS

This doesn't add up since 48M * 28.132 = 1350M nodes and 70M * 16.837 = 1179M nodes. So I'm assuming there's a typo or mis-estimate somewhere.
Accumulated there are 1 361 558 651 nodes total (assuming I did not misstype on my calculator). Divided by 28.132s that yields 48.398 NPS and dividing by 16.837s yields 80.867 NPS. That means the C code is 1.671 times faster than the C# code, or - alternatively formulated - the C# code is 40.1% slower.

Okay, I finally understand where you got the 1.671x from and I think 16837ms must be wrong. I never made the same the changes to the console output in the C version and so I had to add the ms of the individual tests manually on a calculator. I probably bungled one addition or made a typo when transferring the result to the post.

If you look on page six of this thread I printed the whole listing and it took a bit over 19s to complete (I just added the values printed there up again) and I did no optimizations for the C version so there's really no reason why it should run more than two seconds faster now. If we assume the 19.1s is correct then the speed fits my guesstimate.

I'll run it again later. And I think at least in the C# version I'll add a few lines of code to get rid of the need to "guess" the NPS. Sorry for the confusion!

R. Tomasi · Post by **R. Tomasi** » Mon Sep 20, 2021 6:48 pm

mvanthoor wrote: ↑Mon Sep 20, 2021 6:05 pm I don't want to spend the time converting all that, just to end up with a program that is effectively compiled by LLVM, just as when I'd compile the C-version with Clang.

I was thinking about that, too. Effectively, anything that goes through an LLVM backend (where most of the optimizations would happen) should end up with the same speed, plus/minus compiler differences (as you would also get compiling the C code using different versions of Clang).

R. Tomasi · Post by **R. Tomasi** » Mon Sep 20, 2021 6:49 pm

lithander wrote: ↑Mon Sep 20, 2021 6:36 pm
R. Tomasi wrote: ↑Mon Sep 20, 2021 3:57 pm
klx wrote: ↑Mon Sep 20, 2021 3:37 pm
lithander wrote: ↑Mon Sep 20, 2021 8:45 am But I can't see where you think it does contradict he performance logs?
Because you wrote:

C# 28132ms, 48M NPS
C 16837ms, 70M NPS

This doesn't add up since 48M * 28.132 = 1350M nodes and 70M * 16.837 = 1179M nodes. So I'm assuming there's a typo or mis-estimate somewhere.
Accumulated there are 1 361 558 651 nodes total (assuming I did not misstype on my calculator). Divided by 28.132s that yields 48.398 NPS and dividing by 16.837s yields 80.867 NPS. That means the C code is 1.671 times faster than the C# code, or - alternatively formulated - the C# code is 40.1% slower.
Okay, I finally understand where you got the 1.671x from and I think 16837ms must be wrong. I never made the same the changes to the console output in the C version and so I had to add the ms of the individual tests manually on a calculator. I probably bungled one addition or made a typo when transferring the result to the post.

If you look on page six of this thread I printed the whole listing and it took a bit over 19s to complete (I just added the values printed there up again) and I did no optimizations for the C version so there's really no reason why it should run more than two seconds faster now. If we assume the 19.1s is correct then the speed fits my guesstimate.

I'll run it again later. And I think at least in the C# version I'll add a few lines of code to get rid of the need to "guess" the NPS. Sorry for the confusion!

I think we should simply include one or two lines into the code that do the total NPS calculation automatically. Manually calculating stuff like that for every run we do feels very "last century" to me, in any case.

Edit: I see you suggested that already. My bad!

I'll add it to the C version, too, when I'm doing the algorithmic changes later tonight.

klx · Post by **klx** » Mon Sep 20, 2021 10:55 pm

R. Tomasi wrote: ↑Mon Sep 20, 2021 4:39 pm Using the source would, in this case, mean that you use the source, compile it, run it on your machine, and post your findings here

I actually would love to see runs on different machines. I really suspect that the performance difference might not be the same across all CPUs.

As I expected, it was a typo. Anyway on my machine I get the following results, using the optimized v1.4 C# version:

Code: Select all

C: 20959 ms
C#: 33845 ms (1.61x slower)
Java: 37483 ms (1.79x slower)

mvanthoor wrote: ↑Mon Sep 20, 2021 6:05 pm I can do the conversion faster when just using inline functions instead of the macro's, but in that case, the C-version will have a big advantage because it pre-computes lots of data during the compile step.

What do you refer to by this? Whether we use define or inline in the C version should not be relevant, assuming it actually inlines.

mvanthoor · Post by **mvanthoor** » Mon Sep 20, 2021 11:10 pm

klx wrote: ↑Mon Sep 20, 2021 10:55 pm
mvanthoor wrote: ↑Mon Sep 20, 2021 6:05 pm I can do the conversion faster when just using inline functions instead of the macro's, but in that case, the C-version will have a big advantage because it pre-computes lots of data during the compile step.
What do you refer to by this? Whether we use define or inline in the C version should not be relevant, assuming it actually inlines.

There is still a difference. If you inline a function, the compiler pastes the code of the called function into the caller, but the code of the called function still gets executed at every run of the encompassing function. When you use a macro, the compiler not only inlines it, but also resolves it to a single value if at all possible.

Rust (or more correctly, LLVM) can do this with macro's and const functions too; and sometimes, even with inlined functions IF those functions can be resolved to a simpler value. For example, if you do something like this:

Code: Select all

function complicated_function(x) {
	y = (do super complex stuff with x here)
	z = (do more super cmplex stuff with x here)
	return y + z;
}

It could be that LLVM simplifies this, to something like:

Code: Select all

function complicated_function(x) {
	return x + 35;
}

I just don't feel like rewriting the C macro's into stuff that can be optimized by LLVM to the same extent in which the C-compiler can optimize macro's, because a) it's a lot of work, and b) I'm not that good at Rust macro's, because they're code that writes other code (to avoid having to write repetitive code yourself), and I just don't need that a lot. Therefore it'll take longer. If I just rewrite the macro's into simple functions, the Rust version will be slower than the C version.

R. Tomasi · Post by **R. Tomasi** » Mon Sep 20, 2021 11:25 pm

mvanthoor wrote: ↑Mon Sep 20, 2021 11:10 pm There is still a difference. If you inline a function, the compiler pastes the code of the called function into the caller, but the code of the called function still gets executed at every run of the encompassing function. When you use a macro, the compiler not only inlines it, but also resolves it to a single value if at all possible.

Problem is, with modern compilers there is no guarantee an inline function will be inlined at all. The compiler will only consider it as a hint and then do hat it thinks is best. If the macro resolves to a compile-time constant, the corresponding function (in C++) would be a constexpr function, which may get evaluated at compile time.

tcusr · Post by **tcusr** » Mon Sep 20, 2021 11:44 pm

mvanthoor wrote: ↑Mon Sep 20, 2021 11:10 pm
klx wrote: ↑Mon Sep 20, 2021 10:55 pm
mvanthoor wrote: ↑Mon Sep 20, 2021 6:05 pm I can do the conversion faster when just using inline functions instead of the macro's, but in that case, the C-version will have a big advantage because it pre-computes lots of data during the compile step.
What do you refer to by this? Whether we use define or inline in the C version should not be relevant, assuming it actually inlines.
There is still a difference. If you inline a function, the compiler pastes the code of the called function into the caller, but the code of the called function still gets executed at every run of the encompassing function. When you use a macro, the compiler not only inlines it, but also resolves it to a single value if at all possible.

Rust (or more correctly, LLVM) can do this with macro's and const functions too; and sometimes, even with inlined functions IF those functions can be resolved to a simpler value. For example, if you do something like this:
Code: Select all
function complicated_function(x) {
	y = (do super complex stuff with x here)
	z = (do more super cmplex stuff with x here)
	return y + z;
}
It could be that LLVM simplifies this, to something like:
Code: Select all
function complicated_function(x) {
	return x + 35;
}
I just don't feel like rewriting the C macro's into stuff that can be optimized by LLVM to the same extent in which the C-compiler can optimize macro's, because a) it's a lot of work, and b) I'm not that good at Rust macro's, because they're code that writes other code (to avoid having to write repetitive code yourself), and I just don't need that a lot. Therefore it'll take longer. If I just rewrite the macro's into simple functions, the Rust version will be slower than the C version.

C macros are just copying and pasting made by the processor before compilation, the compiler only sees the final expression (i guess this is what you mean by 'inlining' the macro). inline functions are as fast as macros

klx · Post by **klx** » Mon Sep 20, 2021 11:48 pm

mvanthoor wrote: ↑Mon Sep 20, 2021 11:10 pm There is still a difference. If you inline a function, the compiler pastes the code of the called function into the caller, but the code of the called function still gets executed at every run of the encompassing function. When you use a macro, the compiler not only inlines it, but also resolves it to a single value if at all possible.

What specifically in this application would make a difference?

tcusr wrote: ↑Mon Sep 20, 2021 11:44 pm inline functions are as fast as macros

Exactly. I can't speak for Rust, but that's how it is in C / C++. I would expect Rust to do the same.

To prove this, I just replaced all the #defines with inline functions in the C file, and the performance is identical.

tcusr · Post by **tcusr** » Mon Sep 20, 2021 11:51 pm

R. Tomasi wrote: ↑Mon Sep 20, 2021 11:25 pm
mvanthoor wrote: ↑Mon Sep 20, 2021 11:10 pm There is still a difference. If you inline a function, the compiler pastes the code of the called function into the caller, but the code of the called function still gets executed at every run of the encompassing function. When you use a macro, the compiler not only inlines it, but also resolves it to a single value if at all possible.
Problem is, with modern compilers there is no guarantee an inline function will be inlined at all. The compiler will only consider it as a hint and then do hat it thinks is best. If the macro resolves to a compile-time constant, the corresponding function (in C++) would be a constexpr function, which may get evaluated at compile time.

the inline keywords in C and C++ has different meanings.
in C++ is basically 'don't complain if you see multiple definitions of ...', meanwhile in C it is a hint, like you said.
usually only large functions are not inlined but if you really want to you can use GCC's attributes. a rule of thumb is to declare a function inline if it's less than 10 LOC.

Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?

Re: Why C++ instead of C#?