Ok, then pass __m128i and implement an __m128i getter in Test. Or pass const ref with some leading and trailing memory accesses. Things are kept in xmm registers over inlined function boundaries if inlined with const ref and for instance returning *this.Aleks Peshkov wrote:Well, it does understand __m128i arguments, but failed to compileCode: Select all
class Test { __m128i n; }; void test(Test) {}
Passing int64 (bitboard) by value or by reference?
Moderator: Ras
-
Gerd Isenberg
- Posts: 2251
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Passing int64 (bitboard) by value or by reference?
-
Aleks Peshkov
- Posts: 969
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
- Full name: Aleks Peshkov
Re: Passing int64 (bitboard) by value or by reference?
I decided to refactor my functions and do not pass SSE parameters, but calculate them inside.
-
sje
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Moving into the 64 bit world
I'd say that passing single bitboards by value where possible is the way to go.
Except for specialty devices, computing is moving into a 64 bit world. Those who want speed and capacity are already there. Of course, where feasible the source should also compile for 32 bit machines as well, although there might be a performance penalty -- better to run slowly than not to run at all.
Except for specialty devices, computing is moving into a 64 bit world. Those who want speed and capacity are already there. Of course, where feasible the source should also compile for 32 bit machines as well, although there might be a performance penalty -- better to run slowly than not to run at all.
-
Tord Romstad
- Posts: 1808
- Joined: Wed Mar 08, 2006 9:19 pm
- Location: Oslo, Norway
Re: Moving into the 64 bit world
I am not sure I agree: The current trend seems to be towards mobile phones and tiny "netbooks" being the most popular computing devices, and these are still 32-bit more often than not. Moreover, modern 64-bit laptop and desktop CPUs are so fast that it low-level optimization is mostly a waste of time: Perhaps you can squeeze out 50 Elo points or so by optimizing heavily for 64-bit, but even in 32-bit mode the programs have such stratospheric ratings that 50 Elo points more make no noticable difference. For everyone except the top 100 players or so in the world, the difference between playing against a 2900 rated and a 2950 rated program is analogous to the difference between falling from an altitude of 2900 meters and from 2950 meters.sje wrote:I'd say that passing single bitboards by value where possible is the way to go.
Except for specialty devices, computing is moving into a 64 bit world.
In my opinion, optimizing for slow hardware makes much more sense than optimizing for fast hardware. On fast hardware, your program is probably fast enough no matter how little effort you spend optimizing.
Tord
-
wgarvin
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Passing int64 (bitboard) by value or by reference?
For inline functions, it doesn't matter if you passed it by reference or value. The compiler can almost always optimize away the copy (except in cases where you modify it, in which case you would have had to copy it yourself anyways if it was passed by reference).Arash wrote:For inline functions passing by reference means there is no copying and so if you do not change the value it will be faster and if you copy the value to change it the speed will be the same.
Actually this is kind of related to the point above. MSVC does support passing its native SSE types (__m128) by value, however what it does not support is passing wrapper classes that contain a native SSE type (e.g. if you make your own class MyVector4 with a __m128 inside it, and try to pass it around by value on, I think MSVC on x86 will complain about it). However, most of the time the methods you want to do this for are small wrapper methods (like overloaded operators that call one SSE intrinsic, or something). So they are going to be inlined anyways. So in theory, it shouldn't matter if you pass them by reference--the compiler can optimize away the copy.Aleks Peshkov wrote: I have similar problem, Microsoft C++ compiler does not support passing 128-bit SSE variables by value, but Intel and GCC do. It is not possible to write best code without conditional preprocessor tricks.
And it seems to work.. I've never seen a case where it obviously failed to optimize out the "by reference" copies when inlining methods. If you use its SSE intrinsics, you should keep in mind that MSVC 2005 kind of sucks at register allocation for MMX and SSE intrinsics. It is extremely stingy with the registers and it seems unwilling to reorder the instructions much if at all. I haven't tried it with MSVC 2008 but I'd be surprised if anything had changed in that regard. I recall hearing somewhere that Intel's compiler is a bit better at this but I haven't tried it.
-
Arash
Re: Passing int64 (bitboard) by value or by reference?
Hi,
Arash
For inline functions if it is passed by value and you change the value how can a compiler optimize out the copying needed? Sometimes it can eliminate the variable at all, but if it can not, it also can not eliminate the copying needed.wgarvin wrote:For inline functions, it doesn't matter if you passed it by reference or value. The compiler can almost always optimize away the copy (except in cases where you modify it, in which case you would have had to copy it yourself anyways if it was passed by reference).Arash wrote:For inline functions passing by reference means there is no copying and so if you do not change the value it will be faster and if you copy the value to change it the speed will be the same.
Arash
-
wgarvin
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Passing int64 (bitboard) by value or by reference?
[Edit: Ah! It can't. We're saying the same thing. But I was also saying that in that case where it can't remove the copy--i.e. where you modified the parameter value inside the body of the inlined function--then yes the compiler is copying it for you, but if you had passed it by reference instead then you would still have to copy it yourself anyways, so the two cases are basically equivalent. Long story short: there is no downside at all to inlining, except for possibly larger code size, and longer compile times if your codebase is quite large].Arash wrote:For inline functions if it is passed by value and you change the value how can a compiler optimize out the copying needed? Sometimes it can eliminate the variable at all, but if it can not, it also can not eliminate the copying needed.
------
I'm not sure I understand your question exactly, but I suggest thinking of it like this: When the compiler inlines the function, it sort of copies and pastes the code from the inlined function into the caller at the place where the call occurred. But it still has to "evaluate" the parameters like it would for a normal function call, so temporary variables are created in the caller to hold each parameter (and one for the result too, if there is one). The parameters (from the caller) are evaluated into these temporaries, and then the temporaries are used in the inlined code, and then the result temporary is used in place of the return value.
Anyway, the reason all of this is useful, is that now (from the compiler's point of view), the two functions have been merged into one function, and the entire thing can be optimized at the same time. (It also avoids the need for "call" and "return" instructions, but those are pretty cheap nowadays anyway). While inlining, the compiler may *temporarily* create copies of the parameters (to simulate passing them by value during a regular call), however, if the copies were unnecessary then the optimizer will easily detect this and merge the two values back together.
Anyway, here's a small example of what I mean:
Code: Select all
inline int foo(int x, int y)
{
x += 4;
return (x + y);
}
int bar()
{
int param0 = 0;
int param1 = 1;
return 10 + foo(param0, param1)
}
#if 0
// when the compiler inlines foo into bar, it can then optimize the combination of the two functions as if it were a single function containing something like this:
int bar()
{
int param0 = 0;
int param1 = 1;
int _foo_result, _foo_param_0, _foo_param_1;
_foo_param_0 = param0;
_foo_param_1 = param1;
{
_foo_param_0 += 4;
_foo_result = (_foo_param_0 + _foo_param_1);
}
return 10 + _foo_result;
#endif
NOTE: My example was kind of contrived, and any real compiler will easily optimize function bar() down to just { return 15; }