Passing int64 (bitboard) by value or by reference?

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Feb 17, 2009 8:43 pm

Aleks Peshkov wrote:Well, it does understand __m128i arguments, but failed to compile
Code: Select all
class Test {
    __m128i n;
};

void test(Test) {}

Ok, then pass __m128i and implement an __m128i getter in Test. Or pass const ref with some leading and trailing memory accesses. Things are kept in xmm registers over inlined function boundaries if inlined with const ref and for instance returning *this.

Aleks Peshkov · Post by **Aleks Peshkov** » Tue Feb 17, 2009 9:38 pm

I decided to refactor my functions and do not pass SSE parameters, but calculate them inside.

sje · Post by **sje** » Wed Feb 18, 2009 1:05 pm

I'd say that passing single bitboards by value where possible is the way to go.

Except for specialty devices, computing is moving into a 64 bit world. Those who want speed and capacity are already there. Of course, where feasible the source should also compile for 32 bit machines as well, although there might be a performance penalty -- better to run slowly than not to run at all.

Tord Romstad · Post by **Tord Romstad** » Wed Feb 18, 2009 1:37 pm

sje wrote:I'd say that passing single bitboards by value where possible is the way to go.

Except for specialty devices, computing is moving into a 64 bit world.

I am not sure I agree: The current trend seems to be towards mobile phones and tiny "netbooks" being the most popular computing devices, and these are still 32-bit more often than not. Moreover, modern 64-bit laptop and desktop CPUs are so fast that it low-level optimization is mostly a waste of time: Perhaps you can squeeze out 50 Elo points or so by optimizing heavily for 64-bit, but even in 32-bit mode the programs have such stratospheric ratings that 50 Elo points more make no noticable difference. For everyone except the top 100 players or so in the world, the difference between playing against a 2900 rated and a 2950 rated program is analogous to the difference between falling from an altitude of 2900 meters and from 2950 meters.

In my opinion, optimizing for slow hardware makes much more sense than optimizing for fast hardware. On fast hardware, your program is probably fast enough no matter how little effort you spend optimizing.

Tord

wgarvin · Post by **wgarvin** » Thu Feb 19, 2009 3:45 pm

Arash wrote:For inline functions passing by reference means there is no copying and so if you do not change the value it will be faster and if you copy the value to change it the speed will be the same.

For inline functions, it doesn't matter if you passed it by reference or value. The compiler can almost always optimize away the copy (except in cases where you modify it, in which case you would have had to copy it yourself anyways if it was passed by reference).

Aleks Peshkov wrote: I have similar problem, Microsoft C++ compiler does not support passing 128-bit SSE variables by value, but Intel and GCC do. It is not possible to write best code without conditional preprocessor tricks.

Actually this is kind of related to the point above. MSVC does support passing its native SSE types (__m128) by value, however what it does not support is passing wrapper classes that contain a native SSE type (e.g. if you make your own class MyVector4 with a __m128 inside it, and try to pass it around by value on, I think MSVC on x86 will complain about it). However, most of the time the methods you want to do this for are small wrapper methods (like overloaded operators that call one SSE intrinsic, or something). So they are going to be inlined anyways. So in theory, it shouldn't matter if you pass them by reference--the compiler can optimize away the copy.

And it seems to work.. I've never seen a case where it obviously failed to optimize out the "by reference" copies when inlining methods. If you use its SSE intrinsics, you should keep in mind that MSVC 2005 kind of sucks at register allocation for MMX and SSE intrinsics. It is extremely stingy with the registers and it seems unwilling to reorder the instructions much if at all. I haven't tried it with MSVC 2008 but I'd be surprised if anything had changed in that regard. I recall hearing somewhere that Intel's compiler is a bit better at this but I haven't tried it.

Arash · Post by **Arash** » Fri Feb 20, 2009 7:43 am

Hi,

wgarvin wrote:
Arash wrote:For inline functions passing by reference means there is no copying and so if you do not change the value it will be faster and if you copy the value to change it the speed will be the same.
For inline functions, it doesn't matter if you passed it by reference or value. The compiler can almost always optimize away the copy (except in cases where you modify it, in which case you would have had to copy it yourself anyways if it was passed by reference).

For inline functions if it is passed by value and you change the value how can a compiler optimize out the copying needed? Sometimes it can eliminate the variable at all, but if it can not, it also can not eliminate the copying needed.

Arash

wgarvin · Post by **wgarvin** » Sat Feb 21, 2009 7:27 pm

Arash wrote:For inline functions if it is passed by value and you change the value how can a compiler optimize out the copying needed? Sometimes it can eliminate the variable at all, but if it can not, it also can not eliminate the copying needed.

[Edit: Ah! It can't. We're saying the same thing. But I was also saying that in that case where it can't remove the copy--i.e. where you modified the parameter value inside the body of the inlined function--then yes the compiler is copying it for you, but if you had passed it by reference instead then you would still have to copy it yourself anyways, so the two cases are basically equivalent. Long story short: there is no downside at all to inlining, except for possibly larger code size, and longer compile times if your codebase is quite large].

------

I'm not sure I understand your question exactly, but I suggest thinking of it like this: When the compiler inlines the function, it sort of copies and pastes the code from the inlined function into the caller at the place where the call occurred. But it still has to "evaluate" the parameters like it would for a normal function call, so temporary variables are created in the caller to hold each parameter (and one for the result too, if there is one). The parameters (from the caller) are evaluated into these temporaries, and then the temporaries are used in the inlined code, and then the result temporary is used in place of the return value.

Anyway, the reason all of this is useful, is that now (from the compiler's point of view), the two functions have been merged into one function, and the entire thing can be optimized at the same time. (It also avoids the need for "call" and "return" instructions, but those are pretty cheap nowadays anyway). While inlining, the compiler may *temporarily* create copies of the parameters (to simulate passing them by value during a regular call), however, if the copies were unnecessary then the optimizer will easily detect this and merge the two values back together.

Anyway, here's a small example of what I mean:

Code: Select all

inline int foo(int x, int y)
{
    x += 4;
    return (x + y);
}

int bar()
{
    int param0 = 0;
    int param1 = 1;
    return 10 + foo(param0, param1)
}

#if 0
// when the compiler inlines foo into bar, it can then optimize the combination of the two functions as if it were a single function containing something like this:
int bar()
{
    int param0 = 0;
    int param1 = 1;

    int _foo_result, _foo_param_0, _foo_param_1;
    _foo_param_0 = param0;
    _foo_param_1 = param1;
    {
        _foo_param_0 += 4;
        _foo_result = (_foo_param_0 + _foo_param_1);
    }
    return 10 + _foo_result;
#endif

So the rest of the compiler's arsenal of optimization passes are then applied to the combined function. Some kind of dataflow optimization (global value numbering, SSAPRE, etc.) will now discover the fact that a temporary copy (_foo_param_1) of the parameter (param1) was made, but the temporary was not changed. So it will optimize away that copy, and the result will be no worse than if you had written this combined function yourself. So the run-time cost of an inline function should be the same as defining a macro that expands into the inlined code. Except that inlined functions/methods are cleaner, have better type-safety, don't pollute the global namespace as much, etc. Inlining is one of those great things that helps you write better structured source code, without paying any extra runtime cost. It's also the case that if you inline a function foo, then whether it was int foo(const int x) or int foo(const int& x) should make no difference in the generated code.

NOTE: My example was kind of contrived, and any real compiler will easily optimize function bar() down to just { return 15; }

Passing int64 (bitboard) by value or by reference?

Re: Passing int64 (bitboard) by value or by reference?

Re: Passing int64 (bitboard) by value or by reference?

Moving into the 64 bit world

Re: Moving into the 64 bit world

Re: Passing int64 (bitboard) by value or by reference?

Re: Passing int64 (bitboard) by value or by reference?

Re: Passing int64 (bitboard) by value or by reference?