Discussion of chess software programming and technical issues.
Moderators: hgm, Dann Corbit, Harvey Williamson
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

Aleks Peshkov
 Posts: 886
 Joined: Sun Nov 19, 2006 8:16 pm
 Location: Russia
Post
by Aleks Peshkov » Mon Sep 03, 2007 3:35 pm
hgm wrote:Aleks Peshkov wrote:
The only problem value is 0. If your zobrist numbers have a good Hamming Distance against 0, the above problem is gone. This is because 0 is the most common implicit member of any chessposition hash (all empty squares are encoded with implicit 0).
Nonsense. You would have the same problem with
Code: Select all
0xAAAAAAAA55555555
0xAAAAAAAAAAAAAAAA
0x55555555AAAAAAAA
0x5555555555555555
Ok, I see. It is possible to test Distance of some combinations of several keys against new candidate key. The more tests the better, but more tests will lead to less worstcase Distance.
There are two important kinds of collisions  hashcodes of a same square, but different pieces and hashcodes of all squares of a single piece. As for me it is intuitively better to maximize Distance of these combinations, then to test allwithall with with less Hamming Distance.

hgm
 Posts: 25941
 Joined: Fri Mar 10, 2006 9:06 am
 Location: Amsterdam
 Full name: H G Muller

Contact:
Post
by hgm » Mon Sep 03, 2007 4:15 pm
Well, as there can be at most one piece on a square you don't care so much about the randoms belonging to the same square (as long as they are all different). Even if 3 of those would XOR to zero, it could not cause any collisions.
OTOH if the randoms for 3 different squares could XOR to zero, (say Qe4, Ba6 and Ng2) you would have lots of collissions between positions differing only in the occupation of those three squares. (E.g. just a Q on e4 would be the same as e4 empty and a B on a6 and N on g2.) Now even if this would no increase the probability of collisions for randomly chosen positions, they would drive up the probability for collissions between positions in the same tree (and these are the only collisions we care about!).
What you suggest cannot be done in practice, as the number of possible combinations of already chosen randoms quickly becomes unmanagably large.

Aleks Peshkov
 Posts: 886
 Joined: Sun Nov 19, 2006 8:16 pm
 Location: Russia
Post
by Aleks Peshkov » Mon Sep 03, 2007 4:53 pm
hgm wrote:OTOH if the randoms for 3 different squares could XOR to zero, (say Qe4, Ba6 and Ng2) you would have lots of collissions between positions differing only in the occupation of those three squares.
Hamming Distance have transitive property of conditional probability. If Qe4 is 24 bit distant from Ba6 and 24 bit distant from Ng2, there is very small probability 2^(48) that (Ba6 XOR Ng2) will be exactly 0bit distant from Qe4.
It is much profitable to maximize Hamming Distance of some simple selected combinations, then to test against Hamming Distance != 0 with of various multikey combinations.

Uri Blass
 Posts: 8948
 Joined: Wed Mar 08, 2006 11:37 pm
 Location: TelAviv Israel
Post
by Uri Blass » Mon Sep 03, 2007 4:59 pm
Sorry but the probability here is 1 or 0 dependent on the zobrist numbers that you choose.
There cannot be probability that is different than 0 or 1 here unless you have non deterministic way to choose the zobrist numbers and I see no reason for non deterministic way.
The simplest solution here is to take code from other programs that work.
You can check that you calculate perft with no error with hash to see if you have a problem and after finding no problems forget about it.
Edit:I can add that I do not understand how you got your probability even assuming random numbers:
"If Qe4 is 24 bit distant from Ba6 and 24 bit distant from Ng2, there is very small probability 2^(48) that (Ba6 XOR Ng2) will be exactly 0bit distant from Qe4."
Based on the same logic if I replace 24 by 1 I get 2^2 and it is simply not correct because if I change one bit I can do it in 64 different ways(assuming zobrist key of 64 bits) and the probability to have the same bit twice is 1/64
Uri

Aleks Peshkov
 Posts: 886
 Joined: Sun Nov 19, 2006 8:16 pm
 Location: Russia
Post
by Aleks Peshkov » Mon Sep 03, 2007 5:08 pm
Uri Blass wrote:The simplest solution here is to take code from other programs that work.You can check that you calculate perft with no error with hash to see if you have a problem and after finding no problems forget about it.
Bad keys may have very few full 64bit hashkey collisions, but collisions inside 20..28bit hashslots may trash TT performance badly.
Edit:I can add that I do not understand how you got your probability even assuming random numbers:
I can be wrong with math, it is not the main point.

Aleks Peshkov
 Posts: 886
 Joined: Sun Nov 19, 2006 8:16 pm
 Location: Russia
Post
by Aleks Peshkov » Mon Sep 03, 2007 5:23 pm
Uri Blass wrote:Based on the same logic if I replace 24 by 1 I get 2^2 and it is simply not correct because if I change one bit I can do it in 64 different ways(assuming zobrist key of 64 bits) and the probability to have the same bit twice is 1/64
Distance of 1 means that Ca6 and Ng2 is different with probability of 63/64 in exactly 2 bits and with probability of 1/64 in a single bit.
So, 2 bits can be similar to a given mask in 2^2 cases (plus extra small chance of 1 bit collision).

jswaff
Post
by jswaff » Mon Sep 03, 2007 5:27 pm
bob wrote:One simple test is to measure the hamming distance between any two Zobrist numbers (hamming distance is the number of bits that are different between two binary values). The minimum hamming distance between pairs of numbers is an interesting value. If you have several pairs with very small hamming distances, you have a higher probability of a collision when you use those numbers.
For any single value, you can have a Zobrist number with a hamming distance of 64. (a and the binary complement of a, for example). I don't know that anyone has ever computed the minimum hamming distance between any two values in a set of N values. But if you have pairs with a distance of 1 or 2 or 3 (or even 0 heaven forbid) you need to replace one of those values.
I've performed that test in an earlier version of Prophet. I don't remember the value I came up with  I think it was 14. I use 851 keys in all, and the Random() (or Random32()?) function from Crafty. Unfortunately I don't have the code laying around anymore to confirm that number.

James

Uri Blass
 Posts: 8948
 Joined: Wed Mar 08, 2006 11:37 pm
 Location: TelAviv Israel
Post
by Uri Blass » Mon Sep 03, 2007 5:31 pm
Aleks Peshkov wrote:Uri Blass wrote:The simplest solution here is to take code from other programs that work.You can check that you calculate perft with no error with hash to see if you have a problem and after finding no problems forget about it.
Bad keys may have very few full 64bit hashkey collisions, but collisions inside 20..28bit hashslots may trash TT performance badly.
Edit:I can add that I do not understand how you got your probability even assuming random numbers:
I can be wrong with math, it is not the main point.
I think that
collisions that are not 64 bit collision are not very important because you can save more than one position in the same hash entry.
I agree that it is possible that better zobrist key than the zobrist key that top free program use may improve performance slightly(I did not check it
and I do not know if it is possible to improve performance of fruit or glaurung by better zobrist numbers) but I guess that we are talking about difference of not more than 1 elo.
Uri

bob
 Posts: 20923
 Joined: Mon Feb 27, 2006 6:30 pm
 Location: Birmingham, AL
Post
by bob » Mon Sep 03, 2007 6:04 pm
jswaff wrote:bob wrote:One simple test is to measure the hamming distance between any two Zobrist numbers (hamming distance is the number of bits that are different between two binary values). The minimum hamming distance between pairs of numbers is an interesting value. If you have several pairs with very small hamming distances, you have a higher probability of a collision when you use those numbers.
For any single value, you can have a Zobrist number with a hamming distance of 64. (a and the binary complement of a, for example). I don't know that anyone has ever computed the minimum hamming distance between any two values in a set of N values. But if you have pairs with a distance of 1 or 2 or 3 (or even 0 heaven forbid) you need to replace one of those values.
I've performed that test in an earlier version of Prophet. I don't remember the value I came up with  I think it was 14. I use 851 keys in all, and the Random() (or Random32()?) function from Crafty. Unfortunately I don't have the code laying around anymore to confirm that number.

James
Here's a thing to try:
compute the first N numbers (N = number you need in total) and compute the sum of squares of 64  the hamming distance between each pair of numbers. Now generate number N+1, and try replacing each original number, one at a time, with the new number, and see if the sum of squares is smaller. Pick the set with the smallest sum of squares. Repeat until you can't stand it any longer. There are other ways to do the same sort of algorithmic approach to finding the best set.
The 64N is so that smaller hamming distances produce larger numbers, and you want to minimize that. I might just do this myself to see how the optimized set compares to the first N which I now use...
I think I will do this on our cluster, which can eat this kind of problem up by letting each node use a different random number stream.
One note, if you do optimize the set of numbers, don't expect any better play. The tests Anthony and I ran on the ICGA hashing collision paper shows that this is not going to have any significant effect on the search at all. But it is an interesting exercise...

jesper_nielsen
Post
by jesper_nielsen » Mon Sep 03, 2007 6:34 pm
bob wrote:jswaff wrote:bob wrote:One simple test is to measure the hamming distance between any two Zobrist numbers (hamming distance is the number of bits that are different between two binary values). The minimum hamming distance between pairs of numbers is an interesting value. If you have several pairs with very small hamming distances, you have a higher probability of a collision when you use those numbers.
For any single value, you can have a Zobrist number with a hamming distance of 64. (a and the binary complement of a, for example). I don't know that anyone has ever computed the minimum hamming distance between any two values in a set of N values. But if you have pairs with a distance of 1 or 2 or 3 (or even 0 heaven forbid) you need to replace one of those values.
I've performed that test in an earlier version of Prophet. I don't remember the value I came up with  I think it was 14. I use 851 keys in all, and the Random() (or Random32()?) function from Crafty. Unfortunately I don't have the code laying around anymore to confirm that number.

James
Here's a thing to try:
compute the first N numbers (N = number you need in total) and compute the sum of squares of 64  the hamming distance between each pair of numbers. Now generate number N+1, and try replacing each original number, one at a time, with the new number, and see if the sum of squares is smaller. Pick the set with the smallest sum of squares. Repeat until you can't stand it any longer. There are other ways to do the same sort of algorithmic approach to finding the best set.
The 64N is so that smaller hamming distances produce larger numbers, and you want to minimize that. I might just do this myself to see how the optimized set compares to the first N which I now use...
I think I will do this on our cluster, which can eat this kind of problem up by letting each node use a different random number stream.
One note, if you do optimize the set of numbers, don't expect any better play. The tests Anthony and I ran on the ICGA hashing collision paper shows that this is not going to have any significant effect on the search at all. But it is an interesting exercise...
I think i will try your algorithm.
I have been using my set of pseudorandom numbers for a year or so, and never seen anything to make me suspect any problems.
But then when i was testing a KRKN endgame bitbase, my engine would chrash, because of an illegal move from the hash table.
For the zobrist keys to be a problem, i suppose there would not only have to be a collision between positions, but those positions would have to be a part of the same search as well.
Thanks,
Jesper