couple of questions about stockfish code ?

BeyondCritics · Post by **BeyondCritics** » Fri Oct 28, 2016 1:29 am

syzygy wrote:...
Hmm, it is certainly perfectly legal in C99 and C11.

But not so legal in C89. Unfortunately a lot of supposedly secure code is written, using this older standard. Furthermore, make some tiny mistake and suddenly you introduce forbidden pointer aliasing, look at answers 2 here http://stackoverflow.com/questions/2566 ... pe-punning

syzygy wrote: ...
In C++ it might be formally undefined, but at least g++ allows it as a language-extension. I'm sure Clang then does the same.

And what about Visual Studio, Comeau and Intel? This really gets complicated.

Problems over problems for no good reason. Why not just use clean and simple shift instructions and you are done?

Fulvio · Post by **Fulvio** » Fri Oct 28, 2016 1:43 am

syzygy wrote: Why do you think that using TWO registers to keep track of the aggregated score in evaluate() instead of just one incurs no performance penalty?

Please stay calm, I already posted the link that explains that:
"So for x86-based processors, the front-end does two main things - fetch instructions (from where program binaries are stored in memory or the caching system), and decode them into micro-operations."
"Front-end is capable of delivering 4 uops per cycle (or processor clock-tick) to the back"
And that's an old article, i believe Skylake have at least 6 ALU per core.
So if you have:

Code: Select all

struct &#123; int a; int b; &#125; test;

test.a += 1;
is done in one clock cycle.
and both
test.a += 1;
test.b += 1;
are done in one clock cycle too;

I honestly do not know how you can assume how the registers will be used.
The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"

syzygy · Post by **syzygy** » Fri Oct 28, 2016 1:46 am

Fulvio wrote:
syzygy wrote:Why do you think that using TWO registers to keep track of the aggregated score in evaluate() instead of just one incurs no performance penalty?
Please stay calm, I already posted the link that explains that:

Ehm... does register allocation ring a bell?

Sven · Post by **Sven** » Fri Oct 28, 2016 1:46 am

Fulvio wrote:
syzygy wrote: Why do you think that using TWO registers to keep track of the aggregated score in evaluate() instead of just one incurs no performance penalty?
Please stay calm, I already posted the link that explains that:
"So for x86-based processors, the front-end does two main things - fetch instructions (from where program binaries are stored in memory or the caching system), and decode them into micro-operations."
"Front-end is capable of delivering 4 uops per cycle (or processor clock-tick) to the back"
And that's an old article, i believe Skylake have at least 6 ALU per core.
So if you have:
Code: Select all
struct &#123; int a; int b; &#125; test;
test.a += 1;
is done in one clock cycle.
and both
test.a += 1;
test.b += 1;
are done in one clock cycle too;

I honestly do not know how you can assume how the registers will be used.
The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"

Would you see a difference between a struct of two 32-bit integers and a struct of two 16-bit integers?

syzygy · Post by **syzygy** » Fri Oct 28, 2016 1:48 am

Fulvio wrote:The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"

"Pretty sure" allows for the possibility that I turn out to be dead wrong, in which case I will simply have to admit that and will do so. But I don't think you'll prove me wrong here. And I'm talking about Stockfish, not about a simple loop that does not suffer from register pressure.

The reason for being "pretty sure" is that a single register for holding 1 value is pretty certain to be more efficient than two registers for holding 2 values. And while it is true that modern CPUs can perform many operations in parallel, reducing the number of operations is not going to hurt and will leave execution units free for performing other operations.

syzygy · Post by **syzygy** » Fri Oct 28, 2016 2:05 am

BeyondCritics wrote:Why not just use clean and simple shift instructions and you are done?

How cleanly can you do this? Could you post some code?

If the current SF code can be improved, that might be useful.

Fulvio · Post by **Fulvio** » Fri Oct 28, 2016 2:25 am

Sven Schüle wrote: Would you see a difference between a struct of two 32-bit integers and a struct of two 16-bit integers?

This is a wonderful tool:
https://godbolt.org/
I quickly tried this code:

Code: Select all

#include <stdint.h>

int main&#40;) &#123;
  volatile struct &#123; int a; int b; &#125; test1;
  test1.a += 1;
  test1.b += 1;
  volatile struct &#123; int16_t a; int16_t b; &#125; test2;
  test2.a += 1;
  test2.b += 1;  
&#125;

and clang on x86-64 compiles to

Code: Select all

        inc     dword ptr &#91;rsp - 4&#93;
        inc     dword ptr &#91;rsp - 8&#93;
        inc     word ptr &#91;rsp - 10&#93;
        inc     word ptr &#91;rsp - 12&#93;
        xor     eax, eax
        ret

so the only difference here is the size of the object.
This make sense considering the implicit integer promotions:

Code: Select all

int16_t a, b;
a + b;

in reality is:

Code: Select all

static_cast<int>&#40;a&#41; + static_cast<int>&#40;b&#41;;

syzygy · Post by **syzygy** » Fri Oct 28, 2016 2:27 am

syzygy wrote:I do agree that SF's current approach is to be preferred as it does not rely on endianness.

And the "Scoreview" approach was in SF for exactly one month, it seems. Before and after that, make_score() was as it is now, but the extraction functions did some rather complicated arithmetic to get things right.

The last real change was this one:
https://github.com/official-stockfish/S ... 18ed0927fb

kbhearn · Post by **kbhearn** » Fri Oct 28, 2016 5:22 am

well if the goal was to remove the union and avoid umpteen bajillion casts, the tradeoff is doing a manual sign extension and perhaps that'd make it slower (maybe the compiler would recognise what you're doing though and it'd be the same) though i'd argue more readable...

Code: Select all

typedef uint32_t Score; 

inline Score make_score&#40;unsigned int mg, unsigned int eg&#41; &#123; // implicitly converting signed inputs
    return &#40;eg << 16&#41; + mg;
&#125;

// mask and manual sign extension
inline Value mg_Value&#40;Score s&#41; &#123;
    static const uint32_t mask = 0xFFFFU;
    static const int sign = 0x8000;
    return &#40;int&#41;&#40;s & mask&#41; ^ sign - sign;
&#125;

// if lower 16 bits are negative add 0x8000 to propagate up the borrowed 1 from upper 16 bits, then shift and sign extend
inline Value eg_Value&#40;Score s&#41; &#123;
    static const uint32_t borrow = 0x8000U;
    static const int sign = 0x8000;

    return &#40;int&#41;(&#40;s + borrow&#41; >> 16&#41; ^ sign - sign;
&#125;

kbhearn · Post by **kbhearn** » Fri Oct 28, 2016 6:43 am

Knew i shouldn't have tried extracting extraneous brackets... one of them was needed, corrected below:

Code: Select all

typedef uint32_t Score; 

inline Score make_score&#40;unsigned int mg, unsigned int eg&#41; &#123; // implicitly converting signed inputs
    return &#40;eg << 16&#41; + mg;
&#125;

// mask and manual sign extension
inline Value mg_Value&#40;Score s&#41; &#123;
    static const uint32_t mask = 0xFFFFU;
    static const int sign = 0x8000;
    return (&#40;int&#41;&#40;s & mask&#41; ^ sign&#41; - sign;
&#125;

// if lower 16 bits are negative add 0x8000 to propagate up the borrowed 1 from upper 16 bits, then shift and sign extend
inline Value eg_Value&#40;Score s&#41; &#123;
    static const uint32_t borrow = 0x8000U;
    static const int sign = 0x8000;

    return (&#40;int&#41;(&#40;s + borrow&#41; >> 16&#41; ^ sign&#41; - sign;
&#125;

couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?

Re: couple of questions about stockfish code ?