Well, this seems just standard C idiom. I often see code (from others) that looks like
Code: Select all
if(condition1 && (kind=1) ||
condition2 && (kind=2) ||
condition3 && (kind=3) ) {
}
where you need to do very similar things for all 3 conditions (assumed to be complex expressions), but still will need to know for some details which condition triggered it. The probing code I showed is just a streamlined version of that, where the side effect required to identify the condition can be done inside the [] of the array index, so that it doesn't need an extra && operator.
Requiring maintainability is one thing, but requiring maintainability by someone not fluent in the langage in question quite another. The latter path would lead you to allowing only use of assignment statements and (possibly conditional) gotos, because using 'for' and 'while' constructs might make the code too difficult to maintain for people that only know BASIC...
It is a sad fact of life that advanced algorithms are usually more complex than inefficient straightforward ones. Qsort is more difficult to understand than bubble sort. In Chess programming speed is one of the important goals; if you write the most beautiful, elegant and maintainable code for a Chess engine, but to do it you need to use the most inefficient and simplistic algorithms to achieve that, so that it will never do more than 10knps, it will be of zero use to anyone.
Hash probing is an important performance bottleneck. (E.g. micro-Max 1.6, without hash table, does 6 Mnps, micro-Max 4.8 only 1 Mnsp on the same machine.) So it doesn't seem entirely crazy to pay at least some minimal attention to avoid handling it in a needlessly inefficient way. To the uninitiated it might seem that looping through the 4 entries of a cache line is as good a method as any other, but in fact it isn't: cache lines of 64 bytes are loaded from memory in burst of eight 8-byte words, with a significant delay between them in terms of current CPU clock speeds. So each iteration of the loop will likely be waiting for the next memory word to arrive, before it can test for a key match. So there is an incentive to minimize the number of comparisons you need to find the hash hit on probing. And for the case of a miss, where you cannot avoid going through all possible storage locations, to minimize the number of possible storage locations. You gain significant speed by not storing a position in an arbitrary location in a large bucket, so that you have to search without a clue through the bucket to find it back on a later probe.
The XOR method of stepping through the entries is not just to be quirky; it reflects the way the memory hardware fetches the words from memory in a burst access. If you read from address 64*N + 5 (say) you get the words in the order 5, 4, 7, 6, 1, 0, 3, 2 (i.e. as a normal counter that counts 0-7 XORed with the low bits of the originally requested address). So if you have a replacement scheme that can arrange it such that you can guess where in the bucket the entry you are looking for will be with (say) an 80% probability (and it is not possible that this then would always be in the first entry of the bucket!), so that you want to start probing somewhere in the middle of the bucket, you have to deal with the funny order the words will arrive to the CPU. Then, when your first guess (5) was wrong, but the sought position happens to be in the entry formed by words 6 & 7, you won't have to needlessly wait for words 2 & 3 to arrive, which would happen if you tested for a hit on (2 & 3) before you tested for a hit on (6 & 7). This is why my sample code tested the entries in the order it did.
So it seems to me that you blame complexity due to use of a more performant algorithm on the style of the code. This is like looking at qsort, and complaining that bubble sort is much more readable and easy to understand. Which no doubt is true, but misses the point. The value of qsort can only be understood on an entirely different level.
The code Alvaro gives indeed does access the elements in the same order as I used. To me it doesn't seem more readable than the unrolled version; it defenitely requires more code lines. And it still doesn't fit Sven's original requirement that it should also work with buckets of 3. (Which mine did not either.) OTOH, a probing like 0, 2, 3 or 1, 3, 2 (with almost-replace entries in 0 and 1, and depth-preferred in 2 and 3), depending on the 'primary hit location being 0 or 1, and limiting the number of probes to 3, is still easiest done with an unrolled loop. (This would use a mask = hashSize - 2.)