most similar hashes of two positions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: most similar hashes of two positions

Post by Dann Corbit »

It does not matter how close they are as long as one bit is different.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: most similar hashes of two positions

Post by brtzsnr »

Here is a pair with distance 4.

First has polyglot key 7837007157572575988

[d] r1bq1knr/pp2bppp/2n1p3/3p4/6P1/3Q1N2/PPP1PP1P/RNB1KB1R w KQ - 0 1

Second has polyglot key 7908924022443647732

[d] rn2kbr1/pp3ppp/4p2n/2pq1b2/3P4/2N1PN2/PP1B1PPP/R2QKB1R w KQq - 0 1
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: most similar hashes of two positions

Post by syzygy »

brtzsnr wrote:In case of a tie we count the lowest 32 bits. The competition runs until 31st of August (including).
If some people really want to try this, then it would seem better to ask for the first person to find two legal positions (i.e. reachable from the root) having identical Polyglot hash keys. Another option is to ask for two positions with identical Polyglot hash keys that "differ" in the fewest number of moves and unmoves.

It should really not be very difficult to find two different positions mapping to the same polyglot key.

Just do this:
- in addition to the 8-byte polyglot key, add a 1-byte hash key;
- do a big perft while storing all positions (8-byte key plus 1-byte key) in a huge hash table with not too small buckets;
- if the 8-byte polyglot keys collide, check the 1-byte hash key to see whether the positions are identical or whether you have two positions mapping to the same polyglot hash key;
- use the hashtable to skip traversing subtrees you've traversed before.

To find the actual fens once you find two positions mapping to the same polyglot hash key, print out the fen of the current position and the hash key. Redo the perft to find back the first position mapping to that key and print its fen.

Regarding the hash table. One could take 64-byte buckets. If 32GB is available for the hash table, that gives 2^35 / 64 = 2^29 buckets. That means 29 bits of the 64+8=72 bits of hash key do not need to be stored (as they are already encoded in the index number of the bucket). That leaves 43 bits per position. That means about 11.9 positions per bucket, which unfortunately has to be rounded down to 11 positions. Maybe add a 7-bit hash key instead of an 8-bit hash key, then you can store 12 positions in a bucket.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: most similar hashes of two positions

Post by Joost Buijs »

I wonder if you will find Hamming distance zero. With reasonable 64 bit Zobrist keys (which the Polyglot keys probably are) this seems not to happen very often.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: most similar hashes of two positions

Post by hgm »

IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0. I believe this was indeed with Polyglot keys. The position looked comparatively normal.

He did this by normal techniques from linear algebra, as the keys can be considered 768 (or so) vectors in a 64-dimensional space.

There should already be two positions with each 5 pieces with equal key: QRBN together have 512 keys = 2^9. So there are 2^45/5! ~ 2^38 possible combinations of 5 of those. According to the birthday paradox this gives 2^75 pairs of combinations, each with a 1/2^64 probability of being equal. So we would expect 2048 of those pairs to have equal keys. These would then represent two 5-men positions (without Kings and Pawns) that have the same key. You can then add Kings and Pawns to some of the empty squares they have in common to make a realistic positon.
Last edited by hgm on Wed Aug 12, 2015 11:34 pm, edited 1 time in total.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: most similar hashes of two positions

Post by Joost Buijs »

hgm wrote:IIRC Joerg Oster already solved this problem a long time ago, by presenting a position that had key 0. I believe this was indeed with Polyglot keys. The position looked comparatively normal.

He did this by normal techniques from linear algebra, as the keys can be considered 768 (or so) vectors in a 64-dimensional space.

There should already be two positions with each 4 pieces with equal key: QRBN together have 512 keys = 2^9. So there are 2^36 possible combinations of 4 of those. According to the birthday paradox this gives 2^71 pairs of combinations, each with a 1/2^64 probability of being equal. So we would expect 128 of those pairs to have equal keys. These would then represent two 4-men positions (without Kings and Pawns) that have the same key. You canthen add Kings and Pawns to some of the empty squares they have in common to make a realistic positon.
According to the M-theory we live in a 11 dimensional space, 64 dimensions seem a bit overwhelming, at least to me this is.
Henk
Posts: 7216
Joined: Mon May 27, 2013 10:31 am

Re: most similar hashes of two positions

Post by Henk »

bob wrote:
brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:

Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).

The challenge has two categories:

1. The positions can be crafted.
2. The positions must be reached from the starting position (standard chess rules) by a series of legal moves.

In case of a tie we count the lowest 32 bits. The competition runs until 31st of August (including).

The goal of the challenge is to decide whether the hash move validity checking is necessary. Polyglot keys are used in order to be able to verify the result and to eliminate poor choices of the Zobrist hash values.

Please post your FENs.
I am not sure what there is to decide. You can't represent all chess positions with just 64 bits. So far the best has been around 160 bits or so. mapping 2^160 positions into 2^64 bits clearly will have literally gazillions of positions with a hamming distance of zero. Something like 2^96 such positions roughly...

Collisions (false matches with 64 bit signatures) absolutely happen. If an illegal move will crash your engine, you'd better check 'em or suffer the occasional crash since these are just like death, taxes, and such, there is no escaping them.
Skipper is collision free and uses 8 or 9 x 64 bits. Only one advantage: it is simple for me to understand and work with.
brtzsnr
Posts: 433
Joined: Fri Jan 16, 2015 4:02 pm

Re: most similar hashes of two positions

Post by brtzsnr »

With 2

12090907956623999240 [d]rn1qk2r/ppp1ppbn/8/3P1b2/3P4/8/PP2PPPP/RNBQKB1R w KQkq - 0 1

16414363598899675400 [d]r2qkbnr/ppQb1ppp/2n1p3/8/P7/2Pp4/1P3PPP/RNB1KBNR b KQkq - 0 1
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: most similar hashes of two positions

Post by syzygy »

mar wrote:
brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:

Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).
Hamming distance is not a good measure of hash key quality (this has been shown to me by hgm some time ago).
Actually you can craft Zobrist hashes to maximize Hamming distance but such will perform very very badly.
Hamming distance of the Zobrist keys is not a good measure, but what does make sense is to look at all combinations of Zobrist keys that add up to 0. Those combinations form a subspace of the 768-dimensional vector space of all linear combinations of Zobrist keys. An element of this subspace with small Hamming distance corresponds to a small number of Zobrist keys that add up to 0. This combination might correspond to a short series of chess moves and unmoves that convert one position into another position with the same key (whenever that series of moves and unmoves is legal). Such "close" positions are more likely to occur in the same search tree than two very remote positions.
Zenmastur
Posts: 919
Joined: Sat May 31, 2014 8:28 am

Re: most similar hashes of two positions

Post by Zenmastur »

mar wrote:
brtzsnr wrote:Based on the thread "Worst Advice" (http://www.talkchess.com/forum/viewtopic.php?t=57235) I propose the following challenge:

Find two distinct positions whose Polyglot keys (http://hgm.nubati.net/book_format.html) have the lowest Hamming distance (fewer distinct bits).
Hamming distance is not a good measure of hash key quality (this has been shown to me by hgm some time ago).
Actually you can craft Zobrist hashes to maximize Hamming distance but such will perform very very badly.
About a year ago I went through the posts looking for hash collision info. I thought I did a reasonably good search, but I don't remember seeing HGM giving an expose on the subject. Can you point out the discussion?

bob wrote:I am not sure what there is to decide. You can't represent all chess positions with just 64 bits. So far the best has been around 160 bits or so. mapping 2^160 positions into 2^64 bits clearly will have literally gazillions of positions with a hamming distance of zero. Something like 2^96 such positions roughly...

Collisions (false matches with 64 bit signatures) absolutely happen. If an illegal move will crash your engine, you'd better check 'em or suffer the occasional crash since these are just like death, taxes, and such, there is no escaping them.
As I recall, a few years back, you initiated a rather long discussion on this topic. You were trying to produce keys with the maximum possible Hamming distance until someone (offline) demonstrated that this would produce horrible collision problems. IE. XOR'ing any two keys would produce a key already in the set. Unfortunately this conversation degenerated and ended prior to determining what the optimal hamming distance was. I remember thinking to myself that the optimal distance was (hash_key_size_in_bits minus log2(number_of_Zorbist_keys))/4 e.g. (for 784 zorbist keys and a 64-bit hash it would be (64-log2(784))/4 = 13.6). If this is true, then it would also seem that it's impossible to create a set of 64-bit Zorbist keys that can guarantee that no collisions will occur when two positions are separated by 4 or more plies. IE. 13.6 / 2^4 < 1.

It does seem possible, at least in theory, to be able to create a set of 64-bit Zorbist keys which could guarantee that no collisions would occur between 2 positions separated by 3-plies or less. IE. 13.6 / 2^3 > 1 . Although, I have no clue as to how to go about constructing such a set.

Do these sound like reasonable conclusions?


On a somewhat related subject...

You did a collision rate test a few years back in which you showed that very high collision rates (on the order 10^-6) had a relatively minor if any effect on engine move selection. I tried a few months ago to calculate what the collision rate for a given TT size and Zorbist key size but I couldn't find anyway to do it. A guess would be 1 / (2^(key_size_in_bits minus log2(#_of_table_entries)). This would work if all table entries were IID. There not, of course, and you can't have more collisions than table hits. Since the number of table hits in the middle game run between 15% and 20% this would seem to put a upper limit on possible collisions rates during this part of the game. So it's likely the only way to find out is to run some tests. The piece of information that would be nice to know is: With a 64-bit key size what is the number of table entries that will cause a measurable decline in engine ELO due to excess collisions.

If it takes a collision rate > ~10^-6 to cause a noticable change in engine ELO and 20 excess bit in the key vice the number of table entries will produce an error rate of about 10^-6 then it would appear that table sizes as large as 16 trillion entries (2^44 entries) would be possible.

I know what a lot of you are thinking, you're thinking that when the number of table entries exceeds the square root of the possible number of zorbist keys collisions become almost gauranteed. While this may be true, It doesn't tell you how many collisions you will have, just that the probability of having at least one becomes very large. In a table of 2^44 entries you have to have greater than 2^24 probes that return invalid data due to a false positive per 2^44 probes to have an error rate of ~10^-6.

In any case, I don't think this can be tested on a full 64-bit key because of the amount of TT memory needed and the amount of time required to generate the required numbe of cache probes. i.e. a machine probing at 100,000,000 nodes per second would require over 2.5 months to completely fill the table.

Thoughts?

Regards,

Zen
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.