hashing unsigned versus's signed

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

hashing unsigned versus's signed

Post by adams161 »

I was reading the post about hashing, and i started to think, i'm not sure i'm using unsigned hash numbers in every case.

when i generate the initial hash number i do an _int64 = rand() * rand(), this is visual studio 6 , c code, though when i port to linux i use long long instead of int64. am i correct that rand() * rand() is generating 64 bit numbers? If its generating posiitve numbers would i be smarter to make sure all my _int64's are positive?

Hashing seems to work, but i'm just wondering if i deprecated it. Can _int64 be made unsigned?

Mike
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: hashing unsigned versus's signed

Post by adams161 »

looks like i'm using a typedef to make int64 unsigned int64, so the main question now is would unsigned _int 64 x = rand() * rand(); be sufficient to populate a 64 bit numer.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: hashing unsigned versus's signed

Post by wgarvin »

Generally, no.

rand() gives you a number between 0 and RAND_MAX. Most compilers define it to 32767, the minimum required by the standard.

What this means is that you should not use more than the lowest 15 bits of each rand() call... So it will take at least 5 calls to rand() to get enough bits to make a 64-bit zobrist key.

Edit: I would not multiply them together, either... that will bias the results quite a bit. You want to take bits from rand() and put them straight into the result.

I suggest something like this:

Code: Select all

typedef unsigned __int64 U64;
U64 z = 0;
for &#40;int i = 0; i<5; ++i&#41;
&#123;
    z <<= 14;  z ^= &#40;U64&#41;&#40;rand&#40;) & 0x3FFF&#41;;
&#125;
mathmoi
Posts: 286
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: hashing unsigned versus's signed

Post by mathmoi »

adams161 wrote:looks like i'm using a typedef to make int64 unsigned int64, so the main question now is would unsigned _int 64 x = rand() * rand(); be sufficient to populate a 64 bit numer.
As I understand it, rand() is, most of the time, not sufficiently random for generating good zobrist keys. You should use a better algorithme like the Mersenne twister (http://en.wikipedia.org/wiki/Mersenne_Twister). Personnaly, I use the boost implementation of mersenne twister (http://www.boost.org/doc/libs/1_39_0/li ... index.html).

Also, rand() return a signed int (presumably 32 bits integer). In this case, the result will also be signed int, even if you cast it to 64 bits, you still get a value that would fit in a signed int. (edit : Wylie Garvin proposed a solution to this problem)
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: hashing unsigned versus's signed

Post by adams161 »

I want to thank both who responded. So my hash keys appear to be a mess. Still got 2500 blitz on icc. These are the kind of issues i run into, my hashing code was written in 1999 when i first started pulsar. At the time i had no real schooling in programming, ( i would go back to school for programming classes in 2003) , and there is code in pulsar that has been sitting there for 10 years when i knew half as much about what i was doing and sometimes it just gets grandfathered in without me thinking hmm am i doing everything right in the first place. I've tightened up a lot of aspects of hashing, i''ve posted on how to handle the exact conditions for fail high and low, but this issue remained under the radar.

Perhaps pulsar isnt totaly screwed because of the limited deployment of hash, it only hashes in search, not qsearch, and i've been in the habit of sometimes erasing hash keys from move to move.

My hope is when i fix it tonight i enjoy some performance boost. but we wil have to see. Again you fix one thing and maybe you get a boost or maybe with something that was broken now working you realize somethign else is broken. the good news is i have some confidence in my hash code from hours of review, the only thing is the key generation was ignored and my review focused on what i did to generate hash keys afer the keys were populated.

Mike
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: hashing unsigned versus's signed

Post by sje »

In general, unless one is doing signed arithmetic, unsigned integers are preferred over signed integers. Therefore, unsigned integers should be used for hash keys.

----

Do not use rand()/srand() as they are crap. Much better is random()/srandom() although there may be platform dependencies. Remember that you want to have the same set of keys for each run on each possible host.

----

Symbolic uses a pseudorandom bit stream generator where the Nth bit is the count of prime factors of N, modulo two. The resulting sequence is highly aperiodic (none more so, I think) and after many years of use I have never seen a false positive match when using 64 bit keys.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: hashing unsigned versus's signed

Post by hgm »

Micro-Max' Zobrist array is an array of characters, and I simply will it by

T[i++]=rand();
adams161
Posts: 626
Joined: Sun May 13, 2007 9:55 pm
Location: Bay Area, CA USA
Full name: Mike Adams

Re: hashing unsigned versus's signed

Post by adams161 »

why would it matter if it uses the same keys on each run on each host? If you have confidence that you are generating 64 bit keys, and you are confident that a false match chance with 64 bit keys is extremely low, if those 64 bit key numbers varies from run time to run time ( say you seeded random when you started wtih time in seconds) it seems you should still have conficence that even with different keys, teh essential algorithm is stable and with any given set of keys, as long as they are truly 64 bit, the chance of any false match is extremely low and not likely to happen.

Mike
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: hashing unsigned versus's signed

Post by bob »

adams161 wrote:why would it matter if it uses the same keys on each run on each host? If you have confidence that you are generating 64 bit keys, and you are confident that a false match chance with 64 bit keys is extremely low, if those 64 bit key numbers varies from run time to run time ( say you seeded random when you started wtih time in seconds) it seems you should still have conficence that even with different keys, teh essential algorithm is stable and with any given set of keys, as long as they are truly 64 bit, the chance of any false match is extremely low and not likely to happen.

Mike
The Zobrist signature is used to access the book for most programs. Change any random number, and the book access will fail when that number is used since the Zobrist key will change.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: hashing unsigned versus's signed

Post by sje »

bob wrote:
adams161 wrote:why would it matter if it uses the same keys on each run on each host?
The Zobrist signature is used to access the book for most programs. Change any random number, and the book access will fail when that number is used since the Zobrist key will change.
Consistent hash keys are also needed for position based learning and for any other time key values are stored externally.

Note that externalized multibyte scalars should be stored in endian independent format as not every platform uses Intel style little endian number storage. Symbolic uses endian independent external scalars and so runs on both Intel and PowerPC hosts with the exact same binary opening book file.