Comparing results

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Comparing results

Post by hgm »

I am not sure I understand what you explain here, but indeed I mean to use signature bits that are not used for determining the hash address. (In my engines the signature is one 32-bit key, while the address is derived from another 32-bit key, which could be considered the high and low haldf of a 64-bit key.)

I think that would be the most relevant test, as this is what would actually happen when people would try to save bytes in their hash entries by going to a shorter signature.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparing results

Post by bob »

hgm wrote:I am not sure I understand what you explain here, but indeed I mean to use signature bits that are not used for determining the hash address. (In my engines the signature is one 32-bit key, while the address is derived from another 32-bit key, which could be considered the high and low haldf of a 64-bit key.)

I think that would be the most relevant test, as this is what would actually happen when people would try to save bytes in their hash entries by going to a shorter signature.
I probably explained it poorly. Here's the issue. Cozzie and I used normal hash signatures to probe the table, but then we only used N of the bits to match signatures to see if we got a hit. I think that is what you were suggesting. But what I discovered, quickly, was that when we used the real hash signature to probe the table, but only a small number of bits (thru an AND operation) to compare from a match, we matched correctly anyway. Because the original signature took us to the right table entry. It didn't matter how many bits we were using to test, since we were using the rightmost N bits to choose the entry. We found the hashing was good enough so that if we went to a specific entry, we found the correct signature there most of the time. Enough of the time that we could not produce high error rates.

What we decided to do was to set a number of probes per error. If we wanted one error every 1000 probes, we did 999 normal probes, but on number 1000, we would mangle the hash signature, so that it would take us to a wrong entry in the table, and we considered it a match no matter what was there... In fine 70, for example, it was nearly impossible to get an error without doing this... the total number of different positions is pretty low with the locked up pawns...
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Comparing results

Post by Zach Wegner »

bob wrote:I probably explained it poorly. Here's the issue. Cozzie and I used normal hash signatures to probe the table, but then we only used N of the bits to match signatures to see if we got a hit. I think that is what you were suggesting. But what I discovered, quickly, was that when we used the real hash signature to probe the table, but only a small number of bits (thru an AND operation) to compare from a match, we matched correctly anyway. Because the original signature took us to the right table entry. It didn't matter how many bits we were using to test, since we were using the rightmost N bits to choose the entry. We found the hashing was good enough so that if we went to a specific entry, we found the correct signature there most of the time. Enough of the time that we could not produce high error rates.

What we decided to do was to set a number of probes per error. If we wanted one error every 1000 probes, we did 999 normal probes, but on number 1000, we would mangle the hash signature, so that it would take us to a wrong entry in the table, and we considered it a match no matter what was there... In fine 70, for example, it was nearly impossible to get an error without doing this... the total number of different positions is pretty low with the locked up pawns...
HG has a good point though. In practice, nobody would purposely mangle the hash keys. It would be just as good to simply not store any signature info in the hash entry, and rely on the index to verify the hit. This is of course dependent on the size of the hash, but if it works without strength decrease for some basic size of hash (say 4mb, which is 18 bits of index for 16 byte entries), then it might be safe to simply get rid of half of the hash entry, doubling the hash size, and simultaneously gaining another bit of index. :) You also gain a few cycles by not checking the signature. :)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparing results

Post by bob »

Zach Wegner wrote:
bob wrote:I probably explained it poorly. Here's the issue. Cozzie and I used normal hash signatures to probe the table, but then we only used N of the bits to match signatures to see if we got a hit. I think that is what you were suggesting. But what I discovered, quickly, was that when we used the real hash signature to probe the table, but only a small number of bits (thru an AND operation) to compare from a match, we matched correctly anyway. Because the original signature took us to the right table entry. It didn't matter how many bits we were using to test, since we were using the rightmost N bits to choose the entry. We found the hashing was good enough so that if we went to a specific entry, we found the correct signature there most of the time. Enough of the time that we could not produce high error rates.

What we decided to do was to set a number of probes per error. If we wanted one error every 1000 probes, we did 999 normal probes, but on number 1000, we would mangle the hash signature, so that it would take us to a wrong entry in the table, and we considered it a match no matter what was there... In fine 70, for example, it was nearly impossible to get an error without doing this... the total number of different positions is pretty low with the locked up pawns...
HG has a good point though. In practice, nobody would purposely mangle the hash keys. It would be just as good to simply not store any signature info in the hash entry, and rely on the index to verify the hit. This is of course dependent on the size of the hash, but if it works without strength decrease for some basic size of hash (say 4mb, which is 18 bits of index for 16 byte entries), then it might be safe to simply get rid of half of the hash entry, doubling the hash size, and simultaneously gaining another bit of index. :) You also gain a few cycles by not checking the signature. :)
I understand the point. But in endgame positions, which need testing like all others, it is _very_ hard to create false matches unless you intentionally do so. Your idea about not storing the "lock" part (the original signature) will actually work pretty well for fine 70 given a reasonable table size...
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Comparing results

Post by Dann Corbit »

bob wrote:
Zach Wegner wrote:
bob wrote:I probably explained it poorly. Here's the issue. Cozzie and I used normal hash signatures to probe the table, but then we only used N of the bits to match signatures to see if we got a hit. I think that is what you were suggesting. But what I discovered, quickly, was that when we used the real hash signature to probe the table, but only a small number of bits (thru an AND operation) to compare from a match, we matched correctly anyway. Because the original signature took us to the right table entry. It didn't matter how many bits we were using to test, since we were using the rightmost N bits to choose the entry. We found the hashing was good enough so that if we went to a specific entry, we found the correct signature there most of the time. Enough of the time that we could not produce high error rates.

What we decided to do was to set a number of probes per error. If we wanted one error every 1000 probes, we did 999 normal probes, but on number 1000, we would mangle the hash signature, so that it would take us to a wrong entry in the table, and we considered it a match no matter what was there... In fine 70, for example, it was nearly impossible to get an error without doing this... the total number of different positions is pretty low with the locked up pawns...
HG has a good point though. In practice, nobody would purposely mangle the hash keys. It would be just as good to simply not store any signature info in the hash entry, and rely on the index to verify the hit. This is of course dependent on the size of the hash, but if it works without strength decrease for some basic size of hash (say 4mb, which is 18 bits of index for 16 byte entries), then it might be safe to simply get rid of half of the hash entry, doubling the hash size, and simultaneously gaining another bit of index. :) You also gain a few cycles by not checking the signature. :)
I understand the point. But in endgame positions, which need testing like all others, it is _very_ hard to create false matches unless you intentionally do so. Your idea about not storing the "lock" part (the original signature) will actually work pretty well for fine 70 given a reasonable table size...
It seems to me that using the index only will be fine as long as the hash table is large enough.

For instance, we clearly have a deadly serious problem if we store only 1 entry because we will have nearly 100% collisions.
Same is true for 10 entries, 100, entries and to a lesser degree for 1000 entries. That is because we will have 10%, 1% and .1% collisions respectively. But if the table has millions of entries then the collision rate will be small.

So the larger the table is, the safer the approach is.

A hybrid approach might be to use:
1. Unsigned long for a lock if the table is very tiny (e.g. a 1k entries table for super-fast lightning games).
2. Unsigned short for a lock if the table is small.
3. Unsigned char for a lock if the hash size is odd and the table is medium sized (so that the char really has no cost anyway other than the xor when we use it).
4. No lock at all for large tables.

To accomplish that, we could use the skiplist's chummy approach to allocation.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Comparing results

Post by jwes »

bob wrote:
Zach Wegner wrote:
bob wrote:I probably explained it poorly. Here's the issue. Cozzie and I used normal hash signatures to probe the table, but then we only used N of the bits to match signatures to see if we got a hit. I think that is what you were suggesting. But what I discovered, quickly, was that when we used the real hash signature to probe the table, but only a small number of bits (thru an AND operation) to compare from a match, we matched correctly anyway. Because the original signature took us to the right table entry. It didn't matter how many bits we were using to test, since we were using the rightmost N bits to choose the entry. We found the hashing was good enough so that if we went to a specific entry, we found the correct signature there most of the time. Enough of the time that we could not produce high error rates.

What we decided to do was to set a number of probes per error. If we wanted one error every 1000 probes, we did 999 normal probes, but on number 1000, we would mangle the hash signature, so that it would take us to a wrong entry in the table, and we considered it a match no matter what was there... In fine 70, for example, it was nearly impossible to get an error without doing this... the total number of different positions is pretty low with the locked up pawns...
HG has a good point though. In practice, nobody would purposely mangle the hash keys. It would be just as good to simply not store any signature info in the hash entry, and rely on the index to verify the hit. This is of course dependent on the size of the hash, but if it works without strength decrease for some basic size of hash (say 4mb, which is 18 bits of index for 16 byte entries), then it might be safe to simply get rid of half of the hash entry, doubling the hash size, and simultaneously gaining another bit of index. :) You also gain a few cycles by not checking the signature. :)
I understand the point. But in endgame positions, which need testing like all others, it is _very_ hard to create false matches unless you intentionally do so. Your idea about not storing the "lock" part (the original signature) will actually work pretty well for fine 70 given a reasonable table size...
It comes down to what question you want to answer.

1. How does the percentage of false matches affect performance?

or

2. How does the number of bits of the hash key stored in the hash table affect performance?

I find the second question more interesting.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Comparing results

Post by bob »

jwes wrote:
bob wrote:
Zach Wegner wrote:
bob wrote:I probably explained it poorly. Here's the issue. Cozzie and I used normal hash signatures to probe the table, but then we only used N of the bits to match signatures to see if we got a hit. I think that is what you were suggesting. But what I discovered, quickly, was that when we used the real hash signature to probe the table, but only a small number of bits (thru an AND operation) to compare from a match, we matched correctly anyway. Because the original signature took us to the right table entry. It didn't matter how many bits we were using to test, since we were using the rightmost N bits to choose the entry. We found the hashing was good enough so that if we went to a specific entry, we found the correct signature there most of the time. Enough of the time that we could not produce high error rates.

What we decided to do was to set a number of probes per error. If we wanted one error every 1000 probes, we did 999 normal probes, but on number 1000, we would mangle the hash signature, so that it would take us to a wrong entry in the table, and we considered it a match no matter what was there... In fine 70, for example, it was nearly impossible to get an error without doing this... the total number of different positions is pretty low with the locked up pawns...
HG has a good point though. In practice, nobody would purposely mangle the hash keys. It would be just as good to simply not store any signature info in the hash entry, and rely on the index to verify the hit. This is of course dependent on the size of the hash, but if it works without strength decrease for some basic size of hash (say 4mb, which is 18 bits of index for 16 byte entries), then it might be safe to simply get rid of half of the hash entry, doubling the hash size, and simultaneously gaining another bit of index. :) You also gain a few cycles by not checking the signature. :)
I understand the point. But in endgame positions, which need testing like all others, it is _very_ hard to create false matches unless you intentionally do so. Your idea about not storing the "lock" part (the original signature) will actually work pretty well for fine 70 given a reasonable table size...
It comes down to what question you want to answer.

1. How does the percentage of false matches affect performance?

or

2. How does the number of bits of the hash key stored in the hash table affect performance?

I find the second question more interesting.
The problem is the second is far more dynamic in nature. It depends on the hash table size, do you store everywhere or not in q-search, how big is a hash entry. Probably no two programs are the same there, which means the number of bits is highly variable. I was interested, when Cozzie and I wrote the paper, in the first question, which is more simply stated "How many collisions per search can one stand before things start going wrong?" The answer was surprisingly large. How many bits it will take to create that error rate is more complex to answer since hash size depends on hardware speed, memory size, time per move, etc.
MattieShoes
Posts: 718
Joined: Fri Mar 20, 2009 8:59 pm

Re: Comparing results

Post by MattieShoes »

I hope you do find time to test this -- I'd find the results very interesting.

I'd think if one actually used movenumber to encode best move in 8 bits, you could hurt branching factor on collisions even if it doesn't affect search results -- you'd end up ordering the wrong move first. But one might create an 8 bit hash of from and to square, that'd probably cut down on that problem.
verminox

Re: Comparing results

Post by verminox »

MattieShoes wrote:I hope you do find time to test this -- I'd find the results very interesting.

I'd think if one actually used movenumber to encode best move in 8 bits, you could hurt branching factor on collisions even if it doesn't affect search results -- you'd end up ordering the wrong move first. But one might create an 8 bit hash of from and to square, that'd probably cut down on that problem.
I use the second method. A 'hash' of the move containing the co-ordinates. Unfortunately this requires 2 bytes (6-bits each for from and to square, 3 bits for promo). The advantage is that in case of a collision if I get an illegal move from the hashtable it will just not match any move from the move list and therefore the effect will be the same as if there was no corresponding entry to that node. Not so bad I must say.

Dann Corbit wrote:It seems to me that using the index only will be fine as long as the hash table is large enough.
I'd like to try out storing NO signature at all in the hashtable and relying on the index, but I think with even about 4.2 million entries (22-bit addressing) the error rate will start getting high after sometime.

Hashtable entries are only replaced if:
  1. There was a collision on the index, but the stored hash is different. This will never happen if we use ONLY index or a small hash-key.
  2. The depth of the node searched was less than what we want, or if the score was a bound and is not causing a cut-off at that particular time. In this case the best move is noted down, the node is searched properly, and THEN the entry is replaced.
But if a particular entry was stored as "PV-node, score=XX, depth=12".... then all other nodes being searched which have a collision and are being searched at depth <= 12 will immediately return with score XX. Also, since no replacement schemes are in place, soon many entries with exact scores and large depth will start polluting the table and never being replaced. Many nodes will prematurely return the wrong score without even searching forward.

Of course I have no test data for the same, it's just a postulate. Any comments on this?
Aleks Peshkov
Posts: 892
Joined: Sun Nov 19, 2006 9:16 pm
Location: Russia

Re: Comparing results

Post by Aleks Peshkov »

8 byte hash entry can freely fit quiescent search nodes, because they all have the same depth 0.