Warning: no wildly off-topic postings will be tolerated in this thread; those will be deleted without further notice.
To recapitulate what was discussed so far:
A 64-byte hash bucket (aligned with a cace line for efficient memory access) could be organized such that part of the hash signature of each of the entries it contains is packed in one 64-bit word of the bucket. E.g. if there are 7 entries in the bucket, 9 signature bits of each of those could be packed in a 'signature extensions' word (with 1 bit to spare). One could then test for a hit by creating a key that contains 7 similarly packed copies of the corresponding bits of the hash key for the sought entry, and thus compare all 7 signature extensions at once for a match.
In code:
Code: Select all
typedef struct {
// whatever you want to store in the TT, amongst which a hash signature of, say, 24 bits
} HashEntry;
typedef struct {
uint64 extensions;
HashEntry e[7];
} HashBucket;
#define PACKED(X) ( (X) | (X)<<9 | (X)<<2*9 | (X)<<3*9 | (X)<<4*9 | (X)<<5*9 | (X)<<6*9 )
HashBucket hashTable[SIZE];
uint64 hashKey;
// probe code
uint64 index = hashKey & hashMask;
uint64 sought = (hashKey >> 64-9) * PACKED(1);
uint64 diffs = sought ^ hashTable[index].extensions;
uint64 tmp = diffs | PACKED(1<<8);
diffs |= ~PACKED(1<<8);
tmp -= PACKED(1);
diffs |= tmp;
if(~diffs) { // at least one of the signature extensions did match the hash key
// use standard bit-extraction technique on ~diffs to see which entries that were, and check their signature to see if they are true hits
...
}
This makes the treatment of the common case of a hash miss very simple; the chances that a 9-bit key extension matches by accident is only 0.2%, so with 7 entries 98.6% of the misses would not have an accidental match, and would never have to touch any of the entries themselves. This greatly reduces the time the CPU would stall while waiting for those entries to be retrieved from memory, if we put the packed signature extensions in the first word that would be fetched during a cache-line-fill memory burst access (as we did).
For the 1.4% accidental matches we would examine the single matching entry for the remaining 24 key bits, and conclude we did not have a match after all. In general we would only loop over the entries for which the signature extension matched, to compare the remainder of the signature against the corresponding hashKey bits. So it wouldn't cause much overhead if those entries have to be unpacked first. In the case of a true hit it would be the only matching signature extension in 98.8% of the cases, and in half of the other 1.2% we would happen to try the truly matching one first.
In this example the entries themselves were packed in 64-bit words, which should have been made easier by 'outsourcing' a number of signature bits to the signature extension. Such words are accessed atomically, so in an SMP environment there is no chance a memory race could separate the data in the entry from its signature. There can be a race between the entry and the signature-extension write, leading to a corruption of the kind where a signature would be combined with the signature extension that belonged to another entry. But this would then merely invalidate the total signature. Even if other data from the entry would have been stored in the signature-extension word (which we did not do here), it would be moderately protected from corruption by a memory race: only if the other entry it came from would by accident have the same signature extension, such data could be mistaken for genuine. For things like aging bits that seems entirely acceptable.