Zobrist Number Statistics and WHat to Look For

AlvaroBegue · Post by **AlvaroBegue** » Wed Oct 17, 2012 7:56 pm

wgarvin wrote:Don't ever use the built-in C library rand() stuff unless you only use like 12 bits from each call... many crappy rand() implementations exist out there.

Yes, that's why I specified rand() from the *Linux* C library. If you want to use 12 bits from each call to a crappy rand(), don't use the lower 12 bits, because those are often not very random at all.

EDIT: Oh, and one more thing. Even if you make a mistake like using a bad implementation of rand(), using XOR of several sources will save you.

syzygy · Post by **syzygy** » Wed Oct 17, 2012 9:03 pm

diep wrote:Empirically we can show that using 32 bits numbers in a sequential manner (2 adjacent numbers forming a 64 bits number with the top 32 bits number just a few cycles away from the 32 lowbits) is not safe if the same SIMPLE generator generates them, if you are going to use the generated 64 bits number 'as is' and run it on a couple of dozens of cores during a game; provided that you want 0 errors.

I take it that by SIMPLE you mean BAD.

There is no reason a BAD random generator generating 64-bit integers would do worse or better than a BAD random generator generating 32-bit integers which are then pairwise concatenated to 64-bit integers.

The same holds for a GOOD random generator. My advice is to use a GOOD one. As long as it is GOOD, it does not matter whether it generates 64 bits at a time, 32 bits at a time, or just a single bit at a time.

Btw, random bits can be downloaded from http://www.random.org/files/
Just read 64 bits at a time to produce your 64-bit ints.

syzygy · Post by **syzygy** » Wed Oct 17, 2012 9:05 pm

bob wrote:
syzygy wrote:
munchkin wrote:But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing?
As long as the 32-bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64-bit integers directly.

Or will the mean and standard deviation be kept intact, more or less?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32-bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.

Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).

A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
That's a beginner's flaw. The mean of 100 random numbers means absolutely nothing. Doesn't begin to prove the numbers are not random. Nor does it prove they are. In fact, out of a lot of generated sets of 100 numbers, you would expect a set to have a mean of exactly .5, just as you would expect one set to have a set of sequential values. Out of a string of 52, you had BETTER find a consecutive string of 5 numbers every now and then, as the probability of a royal flush in poker is not 0.0...

Your logic fails badly, because that means the 100th number is NEVER random, since it will cause the rest of the group to average SOME value or another... If you pick a number out of thin air, such as .55, you can use the same argument, but it is still wrong... UNLESS you actually generate the numbers such that they average .5... Just because they do is meaningless, however.

You COMPLETELY missed the point.

bob · Post by **bob** » Thu Oct 18, 2012 6:29 am

syzygy wrote:
bob wrote:
syzygy wrote:
munchkin wrote:But what I did was create a list of 32-bit integers -- not 64-bits. I planned to just 'combine' them into 64-bit hash numbers. Will this affect the results of my hashing?
As long as the 32-bit integers are random (independent and uniformly distributed), this is absolutely fine. You could even just generate a long list of random bits, then form your Zobrist keys from blocks of 64 bits. As long as the bits are random, this is exactly as good as generating good random 64-bit integers directly.

Or will the mean and standard deviation be kept intact, more or less?
When you have a good set of random numbers, their mean and standard deviation will be around the numbers you got as the result of them being random. Or rather, as the result of them being uniformly distributed. The other way around this is not the case!! I can easily produce 32-bit numbers that have a mean of 0.5 (after dividing by 2^32) and the correct standard deviation, but that are absolutely not random. For example: 32768 + k * 65536, with k ranging from 0 to 65535. This set of numbers will give very bad hashing results.

Once you have generated your random numbers, the only reason to look at their mean and standard deviation is for a sanity check. If the mean is completely off, something is wrong with your random generator. But a more useful check is to verify that not all your numbers are even (I remember someone posting his Zobrist keys here that all turned out to be divisible by 1024 or so... that is a sign of a very bad random number generator).

A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.
That's a beginner's flaw. The mean of 100 random numbers means absolutely nothing. Doesn't begin to prove the numbers are not random. Nor does it prove they are. In fact, out of a lot of generated sets of 100 numbers, you would expect a set to have a mean of exactly .5, just as you would expect one set to have a set of sequential values. Out of a string of 52, you had BETTER find a consecutive string of 5 numbers every now and then, as the probability of a royal flush in poker is not 0.0...

Your logic fails badly, because that means the 100th number is NEVER random, since it will cause the rest of the group to average SOME value or another... If you pick a number out of thin air, such as .55, you can use the same argument, but it is still wrong... UNLESS you actually generate the numbers such that they average .5... Just because they do is meaningless, however.
You COMPLETELY missed the point.

If so, I reacted to this, specifically: "A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.". That is simply wrong. There are perfectly random ways of producing 100 numbers with a mean of .5... Might take a while, but it is certainly doable.

However, in HIS post, he obviously was not doing what you suggest, or he would not be trying to get even closer. He was doing it the "right way" it seems, which means your comment didn't fit..

diep · Post by **diep** » Thu Oct 18, 2012 11:59 am

syzygy wrote:
diep wrote:Empirically we can show that using 32 bits numbers in a sequential manner (2 adjacent numbers forming a 64 bits number with the top 32 bits number just a few cycles away from the 32 lowbits) is not safe if the same SIMPLE generator generates them, if you are going to use the generated 64 bits number 'as is' and run it on a couple of dozens of cores during a game; provided that you want 0 errors.
I take it that by SIMPLE you mean BAD.

There is no reason a BAD random generator generating 64-bit integers would do worse or better than a BAD random generator generating 32-bit integers which are then pairwise concatenated to 64-bit integers.

The same holds for a GOOD random generator. My advice is to use a GOOD one. As long as it is GOOD, it does not matter whether it generates 64 bits at a time, 32 bits at a time, or just a single bit at a time.

Btw, random bits can be downloaded from http://www.random.org/files/
Just read 64 bits at a time to produce your 64-bit ints.

It does matter as 64 bits already is what you really need every single bit from for Zobrist in computerchess as engines search so deep.

So the total search space they span is pretty big.

Any flaw will backfire.

If we have entries like:

{R(x),x},
{R(R(R(x))),R(R(x))}
etc.

then there always will be a multilineair connection.
That results in reducing number of safetybits you have.

Numbers with a multilineair connection are easier to crack of course,
even for Zobrist - cracking in this case means a collission.

Murphy's law will hunt you.

You can easily test this by measuring number of collisions a game.
Measuring can be done real fast by using for example a 128 bits zobrist instead of 64 bits.

Then you go measure like i did during games and you'll see you get more collissions and errors.

Even in 5 minutes blitz games in this manner you'll get easier collissions.
Just play a bunch and compare.

I did in 2003 a bunch of extensive tests there.

Nowadays everyone has a hashtable of this size and gets enough search depth to be able to measure that.

Also we assume you overwrite in hashtable depthbased. So not some idiotic manner of overwriting that only amateurs use.

Just measure.

Doing searches at a static position is not a good idea as of course as the total search space you span with your search is smaller than 2^64 then.

This is one of the big mistakes made in previous collission researches that were published.

They didn't play games... ...just searched 8 plies... ...and hashtable didn't do normal overwriting depthbased.

Most programs follow what i posted back in 90s already.

Have a few bits in hashtable, say a bit or 8. Now just overwrite using depthleft + searchesdonefromopeningsposition

You add those 2 up and overwrite the smallest entry you find within the 4 or 8 entries or so you sequential scan.

That means that during games a lot of old info is still there from previous moves. That expands the search space suddenly to above 10^19, especially in endgame.

Then suddenly the collissions start coming if you are stubborn and combined 32 bits numbers.... ...to give one example...

I put months of my time into this. All those researchers always, they just busy for 5 minutes and publish another crap research. If you just search for 5 minutes at 1 position you'll find nothing of course.

Even 200 cores @ some hours with a 100+ GB hashtable had just 1 collission when done at openingsposition.

That's not interesting research you know. If the search space your search spanned is not huge, you can write anything on any paper anyway as there simply wasn't a chance in heaven to get a single colllission then.

Gotta play games!

Most collissions are in endgame - if you think about it you'll realize why.

syzygy · Post by **syzygy** » Thu Oct 18, 2012 9:20 pm

bob wrote:If so, I reacted to this, specifically: "A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.". That is simply wrong. There are perfectly random ways of producing 100 numbers with a mean of .5... Might take a while, but it is certainly doable.

There is no way of producing 100 numbers with a mean of exactly 1/2 (why do you think I wrote 1/2 instead of .5) by drawing them independently from a uniform random distribution. The 100 numbers simply stop being independent.

However, in HIS post, he obviously was not doing what you suggest, or he would not be trying to get even closer. He was doing it the "right way" it seems, which means your comment didn't fit..

I never said he did. My point was it simply makes no sense to aim for a mean as close as possible to 0.5 if the goal is to produce good random numbers. Read my whole comment.

Whether the OP is generating random numbers the "right way" we do not know. For sure any reported mean and standard deviation are no good indication of good randomness.

What certainly does NOT help is regenerating news sets of random numbers over and over until one has a set with a mean of exactly 0.5. It is just a waste of effort. In addition, the requirement you set on the generation in fact lowers the randomness.

syzygy · Post by **syzygy** » Thu Oct 18, 2012 9:25 pm

diep wrote:If we have entries like:

{R(x),x},
{R(R(R(x))),R(R(x))}
etc.

then there always will be a multilineair connection.
That results in reducing number of safetybits you have.

I have no idea what you mean by { R(x), x }.

We are talking about taking a 32-bit random generator and using it to construct a 64-bit random generator by generating 2 32-bit numbers and concatenating the bits to produce a 64-bit number.

If this construction results in a BAD 64-bit random generator, then the 32-bit random generator you started with is simply already BAD. Period.

If you start with a GOOD 32-bit random generator, the constructed 64-bit generator will also be GOOD.

bob · Post by **bob** » Thu Oct 18, 2012 11:24 pm

syzygy wrote:
bob wrote:If so, I reacted to this, specifically: "A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.". That is simply wrong. There are perfectly random ways of producing 100 numbers with a mean of .5... Might take a while, but it is certainly doable.
There is no way of producing 100 numbers with a mean of exactly 1/2 (why do you think I wrote 1/2 instead of .5) by drawing them independently from a uniform random distribution. The 100 numbers simply stop being independent.

Sorry, but that is statistically wrong. If you produce enough samples of size 100, where the numbers are constrained to be between 0 and .99999..., then something is WRONG if one of the sets doesn't produce a mean of exactly 100. This is right along the same concept as the poker test, the runs test, and such, for validating PRNGs...

As I said, it might take a while, to produce those numbers. But it might not.

However, in HIS post, he obviously was not doing what you suggest, or he would not be trying to get even closer. He was doing it the "right way" it seems, which means your comment didn't fit..
I never said he did. My point was it simply makes no sense to aim for a mean as close as possible to 0.5 if the goal is to produce good random numbers. Read my whole comment.

Whether the OP is generating random numbers the "right way" we do not know. For sure any reported mean and standard deviation are no good indication of good randomness.

agree there...

[/quote]

What certainly does NOT help is regenerating news sets of random numbers over and over until one has a set with a mean of exactly 0.5. It is just a waste of effort. In addition, the requirement you set on the generation in fact lowers the randomness.[/quote]

Don't follow the last statement, however. Unless there is a basic flaw on the PRGN, any group of one hundred values should be just as random as any other group.

But for optimal numbers, Tony W. and Burt Wendroff (los alamos lab) wrote a paper many years ago addressing this specific thing. I believe the conclusion was that random is good enough.

One can get cute and try to make sure that a pair of numbers or a triplet, or a quartet don't xor to zero, for likely combinations. But that is a hard problem with no significant gain since an occasional hash error is not a killer anyway.

syzygy · Post by **syzygy** » Fri Oct 19, 2012 12:32 am

bob wrote:
syzygy wrote:
bob wrote:If so, I reacted to this, specifically: "A set of 100 numbers a1, a2, ...., a100 with a mean of exactly 1/2 is not random, because a100 is completely determined by a1, a2, ..., a99.". That is simply wrong. There are perfectly random ways of producing 100 numbers with a mean of .5... Might take a while, but it is certainly doable.
There is no way of producing 100 numbers with a mean of exactly 1/2 (why do you think I wrote 1/2 instead of .5) by drawing them independently from a uniform random distribution. The 100 numbers simply stop being independent.
Sorry, but that is statistically wrong. If you produce enough samples of size 100, where the numbers are constrained to be between 0 and .99999..., then something is WRONG if one of the sets doesn't produce a mean of exactly 100.

The fact that you pick out that set is what causes the complete experiment not to correspond anymore to the independent drawing of 100 numbers from a uniform distribution.

Throwing a red and a blue dice and reading out the ordered result corresponds to the independent drawing of two numbers out of a uniform distribution on { 1, 2, 3, 4, 5, 6 }.

Throwing and rethrowing a red and a blue dice until you have 2 times 6 and then reading out the ordered result does not correspond to the independent drawing of two numbers out of a uniform distribution on { 1, 2, 3, 4, 5, 6 }.

Throwing and rethrowing a red and a blue dice until their mean is 3.5 and then reading out the ordered result does not correspond to the independent drawing of two numbers out of a uniform distribution no { 1, 2, 3, 4, 5, 6 }.

What certainly does NOT help is regenerating news sets of random numbers over and over until one has a set with a mean of exactly 0.5. It is just a waste of effort. In addition, the requirement you set on the generation in fact lowers the randomness.
Don't follow the last statement, however. Unless there is a basic flaw on the PRGN, any group of one hundred values should be just as random as any other group.

Nope, see above. Making a selection based on a particular criterion out of many such groups will change the distribution. It is OK to say beforehand that you will take the 1001st generated set of 100 values. It is not OK to say beforehand that you will take the first set that satisfies the condition that the numbers have a particular mean. That will turn the experiment in something else than independently drawing 100 numbers out of a uniform distribution.

syzygy · Post by **syzygy** » Fri Oct 19, 2012 1:18 am

bob wrote:But for optimal numbers, Tony W. and Burt Wendroff (los alamos lab) wrote a paper many years ago addressing this specific thing. I believe the conclusion was that random is good enough.

One can get cute and try to make sure that a pair of numbers or a triplet, or a quartet don't xor to zero, for likely combinations. But that is a hard problem with no significant gain since an occasional hash error is not a killer anyway.

Warnock and Wendroff apparently used error correcting code. In this post from over 15 years ago I explained the link between Zobrist hashing and error correcting codes.

Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For

Re: Zobrist Number Statistics and WHat to Look For