How do I test the quality of the Zobrist keys?

hgm · Post by **hgm** » Wed Sep 05, 2007 2:58 pm

Well, to investigate if Hamming distance has any bearing whatsoever on this problem, I tested independence of Zobrist keys through the following simple program:

Code: Select all

#define SIZE &#40;1<<27&#41;

long long int Zob&#91;12*64&#93;;

int lock&#91;SIZE&#93;;

int cnt;

test&#40;int d, int n, long long int k&#41;
&#123;
    int i, j, m;

    if&#40;d==0&#41;
    &#123;
        i = k & SIZE-1;      // Ignore 5 bits of index part of key
        j = k >> 32 | 0x3FF; // Ignore 10 bits of signature part
                             // &#40;effective key length = 49 bits&#41;
        while&#40;lock&#91;i&#93;)
        &#123;
            if&#40;lock&#91;i&#93; == j&#41;
            &#123;   printf&#40;"%3d. collission %llx\n", cnt++, k&#41;;
                break;
            &#125; else i = i+1 & SIZE-1;
        &#125;
        lock&#91;i&#93; = j;
        return;
    &#125;

    for&#40;i=n; i<12*64; i++)
    &#123;
        test&#40;d-1, i+1, k^Zob&#91;i&#93;);
    &#125;
&#125;

main&#40;)
&#123;
        int i, j, k, cnt=0;

    // initialize Zobrist key table
        for&#40;i=0; i<64*12; i++)
                Zob&#91;i&#93; = (&#40;long long int&#41; rand&#40;)) << 33 ^
                         (&#40;long long int&#41; rand&#40;)) << 22 ^
                         (&#40;long long int&#41; rand&#40;)) << 11 ^
                         (&#40;long long int&#41; rand&#40;))  ^
                         (&#40;long long int&#41; rand&#40;)) >> 11;

        Zob&#91;32&#93; = Zob&#91;31&#93; ^ 0x100000; // try to make bad one

   // Screen all permutations of 3 keys for duplicates
   // &#40;equivalent to screening all permutations of 6 keys for zero&#41;
        test&#40;3, 0, 0&#41;;
&#125;

This should find any combination of 6 randoms from the Zobrist table that XOR to zero (and would thus lead to collisions of positions with as few as 6 differences).

To make it interesting, I shortened the key length to 49 bits. (With a 64-bit key one would expect only collisions with combinations of 9 or more keys, and to test that would take much too long).

I took a Zobrist table of 12*64 = 768 randoms. his give (768*767*766)/6 ~75 million combinations of 3. The probability that any two of these combinations are the same (in the 49 bits I consider) is 1/2^49 (=1.77e-15). There are 75M^2/2 (=2.8e15) pairs of combinations, though. S the number of expected collisions is 5.

In practice I find 6, which is in good ageement.

Now I try to spoil the set by intentionally including two keys (nr 31 and 32) with a very small Hamming distance, only 1 bit different. The result is 7 collisions, the old 6 (which apparently did not involve key #32), and a single new one. Not significantly different. If I would make #32 by XORing #31 by 0x200000 in stead of 0x100000, the number of collisions remains 6.

So indeed, Hamming distance seems irrelevant. Unless it is zero, of course, where it is just a complicated way of saying that two keys are equal.

bob · Post by **bob** » Wed Sep 05, 2007 5:29 pm

hgm wrote:Well, to investigate if Hamming distance has any bearing whatsoever on this problem, I tested independence of Zobrist keys through the following simple program:
Code: Select all
#define SIZE &#40;1<<27&#41;

long long int Zob&#91;12*64&#93;;

int lock&#91;SIZE&#93;;

int cnt;

test&#40;int d, int n, long long int k&#41;
&#123;
    int i, j, m;

    if&#40;d==0&#41;
    &#123;
        i = k & SIZE-1;      // Ignore 5 bits of index part of key
        j = k >> 32 | 0x3FF; // Ignore 10 bits of signature part
                             // &#40;effective key length = 49 bits&#41;
        while&#40;lock&#91;i&#93;)
        &#123;
            if&#40;lock&#91;i&#93; == j&#41;
            &#123;   printf&#40;"%3d. collission %llx\n", cnt++, k&#41;;
                break;
            &#125; else i = i+1 & SIZE-1;
        &#125;
        lock&#91;i&#93; = j;
        return;
    &#125;

    for&#40;i=n; i<12*64; i++)
    &#123;
        test&#40;d-1, i+1, k^Zob&#91;i&#93;);
    &#125;
&#125;

main&#40;)
&#123;
        int i, j, k, cnt=0;

    // initialize Zobrist key table
        for&#40;i=0; i<64*12; i++)
                Zob&#91;i&#93; = (&#40;long long int&#41; rand&#40;)) << 33 ^
                         (&#40;long long int&#41; rand&#40;)) << 22 ^
                         (&#40;long long int&#41; rand&#40;)) << 11 ^
                         (&#40;long long int&#41; rand&#40;))  ^
                         (&#40;long long int&#41; rand&#40;)) >> 11;

        Zob&#91;32&#93; = Zob&#91;31&#93; ^ 0x100000; // try to make bad one

   // Screen all permutations of 3 keys for duplicates
   // &#40;equivalent to screening all permutations of 6 keys for zero&#41;
        test&#40;3, 0, 0&#41;;
&#125;
This should find any combination of 6 randoms from the Zobrist table that XOR to zero (and would thus lead to collisions of positions with as few as 6 differences).

To make it interesting, I shortened the key length to 49 bits. (With a 64-bit key one would expect only collisions with combinations of 9 or more keys, and to test that would take much too long).

I took a Zobrist table of 12*64 = 768 randoms. his give (768*767*766)/6 ~75 million combinations of 3. The probability that any two of these combinations are the same (in the 49 bits I consider) is 1/2^49 (=1.77e-15). There are 75M^2/2 (=2.8e15) pairs of combinations, though. S the number of expected collisions is 5.

In practice I find 6, which is in good ageement.

Now I try to spoil the set by intentionally including two keys (nr 31 and 32) with a very small Hamming distance, only 1 bit different. The result is 7 collisions, the old 6 (which apparently did not involve key #32), and a single new one. Not significantly different. If I would make #32 by XORing #31 by 0x200000 in stead of 0x100000, the number of collisions remains 6.

So indeed, Hamming distance seems irrelevant. Unless it is zero, of course, where it is just a complicated way of saying that two keys are equal.

I am not sure I buy the result yet. Just having two keys with a hamming distance of 1 is not very interesting, if all the others are more normal. But it is certainly possible, using the wrong PRNG, to produce _lots_ of randoms with a small hamming distance, which tends to make this much worse.

There was an ICCA paper by Burton Wendroff 10-15 years ago that addressed this specific issue with some good math. This was in Vol 11, No. 1, "Search Tables in Computer Chess." It is worth reading for a discussion in this context.

hgm · Post by **hgm** » Wed Sep 05, 2007 6:46 pm

Well, if I include a lot of keys with only a single non-zero bit, by

Code: Select all

for&#40;i=8;  i<27; i++) Zob&#91;i&#93; = 1LL << i;
for&#40;i=42; i<64; i++) Zob&#91;i&#93; = 1LL << i;

(so 41 keys, of course I have to make sure that this single bit is not one of the unused ones), I indeed get a bit more collissions: 38.

But this might actually be partly due to rehashing, that can lead to some false collisions between keys that differ only in a few low-order bits.

Anyway, the effect does not seem very dramatic, you really have to have lots and lots of keys with distance 2 to get a dramatic increase in collision rate. The PRNG would really have to conspire against you to cause this. And of course it could always conspire to make dependent keys, even if Hamming distance is of no relevance at all.

bob · Post by **bob** » Wed Sep 05, 2007 7:12 pm

hgm wrote:Well, if I include a lot of keys with only a single non-zero bit, by
Code: Select all
for&#40;i=8;  i<27; i++) Zob&#91;i&#93; = 1LL << i;
for&#40;i=42; i<64; i++) Zob&#91;i&#93; = 1LL << i;
(so 41 keys, of course I have to make sure that this single bit is not one of the unused ones), I indeed get a bit more collissions: 38.

But this might actually be partly due to rehashing, that can lead to some false collisions between keys that differ only in a few low-order bits.

Anyway, the effect does not seem very dramatic, you really have to have lots and lots of keys with distance 2 to get a dramatic increase in collision rate. The PRNG would really have to conspire against you to cause this. And of course it could always conspire to make dependent keys, even if Hamming distance is of no relevance at all.

It seems to me that you are testing random positions. What we really care about are "related positions". IE positions that are created by actually moving pieces on the chess board. That's the case Wendroff wrote about in his paper. We really don't care about how random positions collide since we can't search that way, but we do care about what happens as you make consecutive moves, and that is an interesting issue.

Not that I think it matters whatsoever in the game based on the hashing tests I ran for the paper Cozzie and I wrote last year... It takes a huge number of collisions before the search results start to change... With 64 bit signatures, and even reasonably random Zobrist numbers, it just isn't an issue IMHO.

Theoretically it is an interesting exercise. In reality it turns out to be a big waste of time however.

hgm · Post by **hgm** » Wed Sep 05, 2007 7:56 pm

I don'ttest any positions at all. I am just determining the dependence of the keys by exhaustively searching all combinations of keys. The smallest number of dependent keys tells you how many differences two positions need to have before they can collide. For 64-bit keys in Chess this number cannot be larger than 9.

bob · Post by **bob** » Wed Sep 05, 2007 9:25 pm

hgm wrote:I don'ttest any positions at all. I am just determining the dependence of the keys by exhaustively searching all combinations of keys. The smallest number of dependent keys tells you how many differences two positions need to have before they can collide. For 64-bit keys in Chess this number cannot be larger than 9.

The problem is that when you have 30 pieces on the board, you have far more numbers that interact. And the probability of collisions goes up. That was the basic idea in Wendroff's paper... I'm personally not worried because I protect myself against a bad move from the hash table corrupting my board state. If I didn't do that, I might be worried...

hgm · Post by **hgm** » Wed Sep 05, 2007 10:16 pm

Well, I also protect against bad moves in so far as they could crash the engine.

But it still would be bad if you accidentally had three keys that XOR to zero, as it means that the presence of two of those is equivalent to the presence of the third, so you would have a collision between two positions that are only a capture removed from each other. And it would not be just one pair of positions, but you would have such a collision for every combination of the other pieces on the board. And that might be more than your search can work around, if that capture happens to be possible in the root.

So if you see inexplicable search errors, you use a key that is a bit on the short side, or you use dirty tricks tho shrink the Zobrist table to 1KB, it might still be a good idea to test the keys for independence.

And finally: If the number of key collisions is so small that you never have to worry about them, it simply means that you use too long a key. Your engine would do better with the same resources if you shrunk your hash entry by carving away a few key bits, as the ncreased number of entries would benefit you more than the increased collissions would hurt you.

hgm · Post by **hgm** » Tue Sep 11, 2007 6:23 pm

I also used this method to verify that the trick of overlapping the Zobrist randoms is not more collission prone than having completely independent keys. (I do this so I can use a table of char, rather than long long int, reducing the cache pollution associated with the Zobrist table by a factor 8 ). So the last 7 bytes of one key are the first 7 bytes of the next, etc.

char Keys[12*64+7];

// initialization
for(i=0; i<12*64+7; i++) Keys = rand()>>6;

//usage
HashKey ^= * (long long int *) (Keys + 64*PieceType + Square);

This of course leads to unaligned memory access, but this is not a crime on x86 machines. On the CPUs I tried it only has a very small penalty (it takes 3 clocks on Pentium IV and 2 on other Pentiums, in stead of 1, and on Pentium IV it was possible to do other, aligned memory accesses in parallel). The penalty for fetching something from L2 is far greater.

Testing such a set of overlapping keys of 49 bits only gave 4 collissions of order 6, in stead of the expected average of 5.

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Sep 15, 2007 9:20 pm

a blast from the past...

from rec.games.chess 1994

Tony Warnock

The following paper descibes a method of generating the numbers for a hash table. By using error correcting codes, we ensure that positions that are close on the board are not close in the hash space. Some experiments showed that we got an improvement in collision rate compared to using a random set of numbers.

MacWilliams and Sloane's book on error correcting codes has
the gory details about the theory and programming.

Tony Warnock & Burton Wendroff:
"Search Tables in Computer Chess"
ICCA Journal (March 1988), pp. 10-13.

Jonathan Schaeffer

... I can speak from experience here. In the early versions of my chess program Phoenix, I generated my Zobrist hash numbers using my student id number as a seed, naively thinking the radom numbers generated by this seed would be good enough. A few years later I put code in to detect when my 32-bit hash key matched the wrong position. To my surprise, there were *lots* of errors. I changed my seed to another number and the error rate dropped dramatically. With this better seed, it became very, very rare to see a hash error.

All randomly generated numbers are not the same!

bob · Post by **bob** » Wed Sep 19, 2007 5:08 am

Gerd Isenberg wrote:a blast from the past...

from rec.games.chess 1994

Tony Warnock

The following paper descibes a method of generating the numbers for a hash table. By using error correcting codes, we ensure that positions that are close on the board are not close in the hash space. Some experiments showed that we got an improvement in collision rate compared to using a random set of numbers.

MacWilliams and Sloane's book on error correcting codes has
the gory details about the theory and programming.

Tony Warnock & Burton Wendroff:
"Search Tables in Computer Chess"
ICCA Journal (March 1988), pp. 10-13.
Jonathan Schaeffer
... I can speak from experience here. In the early versions of my chess program Phoenix, I generated my Zobrist hash numbers using my student id number as a seed, naively thinking the radom numbers generated by this seed would be good enough. A few years later I put code in to detect when my 32-bit hash key matched the wrong position. To my surprise, there were *lots* of errors. I changed my seed to another number and the error rate dropped dramatically. With this better seed, it became very, very rare to see a hash error.

All randomly generated numbers are not the same!

I had already cited the warnock paper somewhere in this thread.

It was interesting back then, still applies today, although if you try to start a discussion on this topic, the discussion goes off into the twilight zone rather than staying on topic...

How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?

Re: How do I test the quality of the Zobrist keys?