hashing unsigned versus's signed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: hashing unsigned versus's signed

Post by bob »

sje wrote:
bob wrote:
adams161 wrote:why would it matter if it uses the same keys on each run on each host?
The Zobrist signature is used to access the book for most programs. Change any random number, and the book access will fail when that number is used since the Zobrist key will change.
Consistent hash keys are also needed for position based learning and for any other time key values are stored externally.

Note that externalized multibyte scalars should be stored in endian independent format as not every platform uses Intel style little endian number storage. Symbolic uses endian independent external scalars and so runs on both Intel and PowerPC hosts with the exact same binary opening book file.
Crafty has had an endian-independent book for a few years now. Because of the issues involved in moving from machines like the sun/cray/etc that use big-endian, to the machines that use little-endian. Not to forget other issues such as the size of values such as ints and floats from machine to machine.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: hashing unsigned versus's signed

Post by sje »

bob wrote:
sje wrote: Consistent hash keys are also needed for position based learning and for any other time key values are stored externally.

Note that externalized multibyte scalars should be stored in endian independent format as not every platform uses Intel style little endian number storage. Symbolic uses endian independent external scalars and so runs on both Intel and PowerPC hosts with the exact same binary opening book file.
Crafty has had an endian-independent book for a few years now. Because of the issues involved in moving from machines like the sun/cray/etc that use big-endian, to the machines that use little-endian. Not to forget other issues such as the size of values such as ints and floats from machine to machine.
Along the same lines, C/C++ bitfields should be avoided for any externalized data like an opening book. An old version of Symbolic got bitten (pun!) by this once; explicit shifting and masking is now used throughout the program.

Trivia: The old PDP-11 machines had a MIXED endian format for storing 32 bit values. I wonder if Ken considered this with the early software only Belle; I seem to recall that the program was portable to just about any Unix host.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Sample code: avoiding C/C++ bitfields

Post by sje »

Some sample code for Symbolic's Move class avoiding C/C++ bitfields:

Code: Select all

    Sq  GetFrSq(void)  const {return (Sq)  ((bits >> MBSfrsq ) & MBMfrsq );}
    Sq  GetToSq(void)  const {return (Sq)  ((bits >> MBStosq ) & MBMtosq );}
    Man GetFrMan(void) const {return (Man) ((bits >> MBSfrman) & MBMfrman);}
    Man GetToMan(void) const {return (Man) ((bits >> MBStoman) & MBMtoman);}
    Mc  GetMc(void)    const {return (Mc)  ((bits >> MBSmc   ) & MBMmc   );}
    ui  GetFlags(void) const {return (ui)  ((bits >> MBSflags) & MBMflags);}

    void PutFrSq&#40;const Sq sq&#41;    &#123;bits &= 0x03ffffff; bits |= &#40;sq  << MBSfrsq&#41; ;&#125;
    void PutToSq&#40;const Sq sq&#41;    &#123;bits &= 0xfc0fffff; bits |= &#40;sq  << MBStosq&#41; ;&#125;
    void PutFrMan&#40;const Man man&#41; &#123;bits &= 0xfff0ffff; bits |= &#40;man << MBSfrman&#41;;&#125;
    void PutToMan&#40;const Man man&#41; &#123;bits &= 0xffff0fff; bits |= &#40;man << MBStoman&#41;;&#125;
    void PutMc&#40;const Mc mc&#41;      &#123;bits &= 0xfffff1ff; bits |= &#40;mc  << MBSmc&#41;   ;&#125;
    void PutFlags&#40;const ui val&#41;  &#123;bits &= 0xfffffe00; bits |= &#40;val << MBSflags&#41;;&#125;

    void ResetMfb&#40;const Mfb mfb&#41; &#123;bits &= ~mfb;&#125;
    void SetMfb&#40;const Mfb mfb&#41;   &#123;bits |=  mfb;&#125;
    bool TestMfb&#40;const Mfb mfb&#41; const &#123;return bits & mfb;&#125;

    bool IsCapture&#40;void&#41;         const &#123;return IsSimpleCapture&#40;) || IsEnPassant&#40;);&#125;
    bool IsCastling&#40;void&#41;        const &#123;return 0x000c & BX&#40;GetMc&#40;));&#125;
    bool IsCertain&#40;void&#41;         const &#123;return TestMfb&#40;MfbCert&#41;;&#125;
    bool IsCheck&#40;void&#41;           const &#123;return TestMfb&#40;MfbChck&#41;;&#125;
    bool IsCheckmate&#40;void&#41;       const &#123;return TestMfb&#40;MfbMate&#41;;&#125;
    bool IsDraw&#40;void&#41;            const &#123;return TestMfb&#40;MfbDraw&#41;;&#125;
    bool IsEnPassant&#40;void&#41;       const &#123;return GetMc&#40;) == McEPC;&#125;
    bool IsGainer&#40;void&#41;          const &#123;return IsSimpleCapture&#40;) || &#40;0x00f2 & BX&#40;GetMc&#40;)));&#125;
    bool IsHolder&#40;void&#41;          const &#123;return IsRegularNonCapt&#40;) || IsCastling&#40;);&#125;
    bool IsMarked&#40;void&#41;          const &#123;return TestMfb&#40;MfbMark&#41;;&#125;
    bool IsNeedFileDisam&#40;void&#41;   const &#123;return TestMfb&#40;MfbAndf&#41;;&#125;
    bool IsNeedRankDisam&#40;void&#41;   const &#123;return TestMfb&#40;MfbAndr&#41;;&#125;
    bool IsNotCertain&#40;void&#41;      const &#123;return !TestMfb&#40;MfbCert&#41;;&#125;
    bool IsNotNull&#40;void&#41;         const &#123;return !TestMfb&#40;MfbNull&#41;;&#125;
    bool IsNotSearched&#40;void&#41;     const &#123;return !TestMfb&#40;MfbSrch&#41;;&#125;
    bool IsNotVoid&#40;void&#41;         const &#123;return bits != 0;&#125;
    bool IsNull&#40;void&#41;            const &#123;return TestMfb&#40;MfbNull&#41;;&#125;
    bool IsNullOrVoid&#40;void&#41;      const &#123;return IsNull&#40;) || IsVoid&#40;);&#125;
    bool IsPawnMove&#40;void&#41;        const &#123;return IsManPawn&#40;GetFrMan&#40;));&#125;
    bool IsPromotion&#40;void&#41;       const &#123;return 0x00f0 & BX&#40;GetMc&#40;));&#125;
    bool IsPromotionThreat&#40;void&#41; const &#123;return IsPawnMove&#40;) && IsRank2Or7&#40;MapSqToRank&#40;GetToSq&#40;)));&#125;
    bool IsRegular&#40;void&#41;         const &#123;return GetMc&#40;) == McReg;&#125;
    bool IsRegularNonCapt&#40;void&#41;  const &#123;return IsRegular&#40;) && IsManVacant&#40;GetToMan&#40;));&#125;
    bool IsResetsHmvc&#40;void&#41;      const &#123;return IsPawnMove&#40;) || IsSimpleCapture&#40;);&#125;
    bool IsSearched&#40;void&#41;        const &#123;return TestMfb&#40;MfbSrch&#41;;&#125;
    bool IsSimpleCapture&#40;void&#41;   const &#123;return IsManNotVacant&#40;GetToMan&#40;));&#125;
    bool IsSpecial&#40;void&#41;         const &#123;return GetMc&#40;) != McReg;&#125;
    bool IsUnderpromotion&#40;void&#41;  const &#123;return 0x0070 & BX&#40;GetMc&#40;));&#125;
    bool IsVoid&#40;void&#41;            const &#123;return bits == 0;&#125;
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: hashing unsigned versus's signed

Post by bob »

sje wrote:
bob wrote:
sje wrote: Consistent hash keys are also needed for position based learning and for any other time key values are stored externally.

Note that externalized multibyte scalars should be stored in endian independent format as not every platform uses Intel style little endian number storage. Symbolic uses endian independent external scalars and so runs on both Intel and PowerPC hosts with the exact same binary opening book file.
Crafty has had an endian-independent book for a few years now. Because of the issues involved in moving from machines like the sun/cray/etc that use big-endian, to the machines that use little-endian. Not to forget other issues such as the size of values such as ints and floats from machine to machine.
Along the same lines, C/C++ bitfields should be avoided for any externalized data like an opening book. An old version of Symbolic got bitten (pun!) by this once; explicit shifting and masking is now used throughout the program.

Trivia: The old PDP-11 machines had a MIXED endian format for storing 32 bit values. I wonder if Ken considered this with the early software only Belle; I seem to recall that the program was portable to just about any Unix host.
Or even better. Use an alpha where you can set a bit in the hardware to make it big endian or little endian. Something for everyone there. :)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: hashing unsigned versus's signed

Post by sje »

bob wrote:
sje wrote:Trivia: The old PDP-11 machines had a MIXED endian format for storing 32 bit values. I wonder if Ken considered this with the early software only Belle; I seem to recall that the program was portable to just about any Unix host.
Or even better. Use an alpha where you can set a bit in the hardware to make it big endian or little endian. Something for everyone there. :)
The PowerPC has the same bit, but I don't know where it's used except for Intel/AMD emulation.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Other pseudorandom sources for hash keys

Post by sje »

There are other pseudorandom sources available on modern Unix-like hosts in the form of a pair of special device read-only files:

/dev/urandom Fast and good
/dev/random Very slow and extremely good

There will be different values on each invocation, so there's no repeatability unless the values are read once and stored.

For Symbolic, it will keep its well-tested prime-factor-count modulo two stream. It has never had a false positive match and it passed the long perft 11 test.
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: hashing unsigned versus's signed

Post by adams161 »

I had a question on this code:

typedef unsigned __int64 U64;
U64 z = 0;
for (int i = 0; i<5; ++i)
{
z <<= 14; z ^= (U64)(rand() & 0x3FFF);


i'm no bit expert, i use what i have to in hashing, but if we are getting the lowest 15 bits 5 times, as your description read, are we not loading 75 bits into a 64 bit number? Is there an issue there? does it handle the overflow ?

if you have zz <<=14 is that the operator loading 15 bits? 0-14? should the final call be truncated? zz<<=3 for final call?

Mike
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: hashing unsigned versus's signed

Post by bob »

adams161 wrote:I had a question on this code:

typedef unsigned __int64 U64;
U64 z = 0;
for (int i = 0; i<5; ++i)
{
z <<= 14; z ^= (U64)(rand() & 0x3FFF);


i'm no bit expert, i use what i have to in hashing, but if we are getting the lowest 15 bits 5 times, as your description read, are we not loading 75 bits into a 64 bit number? Is there an issue there? does it handle the overflow ?

if you have zz <<=14 is that the operator loading 15 bits? 0-14? should the final call be truncated? zz<<=3 for final call?

Mike
Works fine. when you shift something to the left, with a logical shift, the upper bits are shifted out and lost.
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: hashing unsigned versus's signed

Post by adams161 »

Any special rules for intializing the start hash? like i have a hash board populated with 64 bit numbers of board[64][12], i dont hash empty squares, so a white king on square 4 is like

currenthash=currenthash^board[3][5];

3 for square 4 ( 0-3) and 5 for the white king 0-5 for 6 white pieces.

I do this as i loop through board and find pieces on squares ( again i dont hash empty squares), but one thing i wonder is how to best start the hash. Teh curent method is to set intialhash=getHashNumber();

and hash the first piece on a square into intial hash. so if that king was the first piece it be,

currenthash = intialhash^board[3][5];

i'm only concerned about unqiequness and randomness of this method i suppose.

would it be best to start the hash with intialhash=0?

my old method was to start the hash with currenthash = board[3][5], if that was the first piece and then hash in each additional piece to it.

Mike
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: hashing unsigned versus's signed

Post by adams161 »

I had a couple of posts in summer and fall of 08, about fail high and fail low, and i think i got that theory down. I've begun a review in recent weeks of the full hash code, the reason this thread started. I picked up how to hash in 1999 in channel 64 chat on icc, and as i've seen i've had to modify how i choose my random hash numbers and my hash replacement policy. this last post on a rather basic part of hashing, how to intialize the intial hash, is kind of this is how i've been doing it, based on ideas i learned in 99, but I no longer want to automaticly assume that any implementation constructed in 1999 has been correct. hence this post.

Mike