couple of questions about stockfish code ?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: couple of questions about stockfish code ?

Post by BeyondCritics »

syzygy wrote:...
Hmm, it is certainly perfectly legal in C99 and C11.
But not so legal in C89. Unfortunately a lot of supposedly secure code is written, using this older standard. Furthermore, make some tiny mistake and suddenly you introduce forbidden pointer aliasing, look at answers 2 here http://stackoverflow.com/questions/2566 ... pe-punning
syzygy wrote: ...
In C++ it might be formally undefined, but at least g++ allows it as a language-extension. I'm sure Clang then does the same.
And what about Visual Studio, Comeau and Intel? This really gets complicated.

Problems over problems for no good reason. Why not just use clean and simple shift instructions and you are done?
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: couple of questions about stockfish code ?

Post by Fulvio »

syzygy wrote: Why do you think that using TWO registers to keep track of the aggregated score in evaluate() instead of just one incurs no performance penalty?
Please stay calm, I already posted the link that explains that:
"So for x86-based processors, the front-end does two main things - fetch instructions (from where program binaries are stored in memory or the caching system), and decode them into micro-operations."
"Front-end is capable of delivering 4 uops per cycle (or processor clock-tick) to the back"
And that's an old article, i believe Skylake have at least 6 ALU per core.
So if you have:

Code: Select all

struct { int a; int b; } test;
test.a += 1;
is done in one clock cycle.
and both
test.a += 1;
test.b += 1;
are done in one clock cycle too;

I honestly do not know how you can assume how the registers will be used.
The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

Fulvio wrote:
syzygy wrote:Why do you think that using TWO registers to keep track of the aggregated score in evaluate() instead of just one incurs no performance penalty?
Please stay calm, I already posted the link that explains that:
Ehm... does register allocation ring a bell?
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: couple of questions about stockfish code ?

Post by Sven »

Fulvio wrote:
syzygy wrote: Why do you think that using TWO registers to keep track of the aggregated score in evaluate() instead of just one incurs no performance penalty?
Please stay calm, I already posted the link that explains that:
"So for x86-based processors, the front-end does two main things - fetch instructions (from where program binaries are stored in memory or the caching system), and decode them into micro-operations."
"Front-end is capable of delivering 4 uops per cycle (or processor clock-tick) to the back"
And that's an old article, i believe Skylake have at least 6 ALU per core.
So if you have:

Code: Select all

struct { int a; int b; } test;
test.a += 1;
is done in one clock cycle.
and both
test.a += 1;
test.b += 1;
are done in one clock cycle too;

I honestly do not know how you can assume how the registers will be used.
The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"
Would you see a difference between a struct of two 32-bit integers and a struct of two 16-bit integers?
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

Fulvio wrote:The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"
"Pretty sure" allows for the possibility that I turn out to be dead wrong, in which case I will simply have to admit that and will do so. But I don't think you'll prove me wrong here. And I'm talking about Stockfish, not about a simple loop that does not suffer from register pressure.

The reason for being "pretty sure" is that a single register for holding 1 value is pretty certain to be more efficient than two registers for holding 2 values. And while it is true that modern CPUs can perform many operations in parallel, reducing the number of operations is not going to hurt and will leave execution units free for performing other operations.
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

BeyondCritics wrote:Why not just use clean and simple shift instructions and you are done?
How cleanly can you do this? Could you post some code?

If the current SF code can be improved, that might be useful.
Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: couple of questions about stockfish code ?

Post by Fulvio »

Sven Schüle wrote: Would you see a difference between a struct of two 32-bit integers and a struct of two 16-bit integers?
This is a wonderful tool:
https://godbolt.org/
I quickly tried this code:

Code: Select all

#include <stdint.h>

int main&#40;) &#123;
  volatile struct &#123; int a; int b; &#125; test1;
  test1.a += 1;
  test1.b += 1;
  volatile struct &#123; int16_t a; int16_t b; &#125; test2;
  test2.a += 1;
  test2.b += 1;  
&#125;
and clang on x86-64 compiles to

Code: Select all

        inc     dword ptr &#91;rsp - 4&#93;
        inc     dword ptr &#91;rsp - 8&#93;
        inc     word ptr &#91;rsp - 10&#93;
        inc     word ptr &#91;rsp - 12&#93;
        xor     eax, eax
        ret
so the only difference here is the size of the object.
This make sense considering the implicit integer promotions:

Code: Select all

int16_t a, b;
a + b;
in reality is:

Code: Select all

static_cast<int>&#40;a&#41; + static_cast<int>&#40;b&#41;;
syzygy
Posts: 5566
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

syzygy wrote:I do agree that SF's current approach is to be preferred as it does not rely on endianness.
And the "Scoreview" approach was in SF for exactly one month, it seems. Before and after that, make_score() was as it is now, but the extraction functions did some rather complicated arithmetic to get things right.

The last real change was this one:
https://github.com/official-stockfish/S ... 18ed0927fb
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: couple of questions about stockfish code ?

Post by kbhearn »

well if the goal was to remove the union and avoid umpteen bajillion casts, the tradeoff is doing a manual sign extension and perhaps that'd make it slower (maybe the compiler would recognise what you're doing though and it'd be the same) though i'd argue more readable...

Code: Select all

typedef uint32_t Score; 

inline Score make_score&#40;unsigned int mg, unsigned int eg&#41; &#123; // implicitly converting signed inputs
    return &#40;eg << 16&#41; + mg;
&#125;

// mask and manual sign extension
inline Value mg_Value&#40;Score s&#41; &#123;
    static const uint32_t mask = 0xFFFFU;
    static const int sign = 0x8000;
    return &#40;int&#41;&#40;s & mask&#41; ^ sign - sign;
&#125;

// if lower 16 bits are negative add 0x8000 to propagate up the borrowed 1 from upper 16 bits, then shift and sign extend
inline Value eg_Value&#40;Score s&#41; &#123;
    static const uint32_t borrow = 0x8000U;
    static const int sign = 0x8000;

    return &#40;int&#41;(&#40;s + borrow&#41; >> 16&#41; ^ sign - sign;
&#125;
kbhearn
Posts: 411
Joined: Thu Dec 30, 2010 4:48 am

Re: couple of questions about stockfish code ?

Post by kbhearn »

Knew i shouldn't have tried extracting extraneous brackets... one of them was needed, corrected below:

Code: Select all

typedef uint32_t Score; 

inline Score make_score&#40;unsigned int mg, unsigned int eg&#41; &#123; // implicitly converting signed inputs
    return &#40;eg << 16&#41; + mg;
&#125;

// mask and manual sign extension
inline Value mg_Value&#40;Score s&#41; &#123;
    static const uint32_t mask = 0xFFFFU;
    static const int sign = 0x8000;
    return (&#40;int&#41;&#40;s & mask&#41; ^ sign&#41; - sign;
&#125;

// if lower 16 bits are negative add 0x8000 to propagate up the borrowed 1 from upper 16 bits, then shift and sign extend
inline Value eg_Value&#40;Score s&#41; &#123;
    static const uint32_t borrow = 0x8000U;
    static const int sign = 0x8000;

    return (&#40;int&#41;(&#40;s + borrow&#41; >> 16&#41; ^ sign&#41; - sign;
&#125;