TT size is given in MB, and each element is 16 bytes. Evidently having multiple slots allows you to get by with a substantially smaller table, perhaps 1/8 the size.
It's common knowledge that multiple slots are better than one. My question is: why are they so much better?
TT size is given in MB, and each element is 16 bytes. Evidently having multiple slots allows you to get by with a substantially smaller table, perhaps 1/8 the size.
It's common knowledge that multiple slots are better than one. My question is: why are they so much better?
Robert P.
Simple... better replacement so that you don't overwrite things that save more effort with things that barely save anything. Whenever you can exercise a choice, things are better.
OK, this might be a silly question, but here goes:
For a given TT size in your table above, say 1024, does the 4 slot version have only one quarter of the number of entries as the 1 slot? The time differences seem a bit too large to me, so I thought the 4 slot might be 4x larger by mistake.
brianr wrote:For a given TT size in your table above, say 1024, does the 4 slot version have only one quarter of the number of entries as the 1 slot? The time differences seem a bit too large to me, so I thought the 4 slot might be 4x larger by mistake.
Also, what replacement algorithim did you use?
The table sizes are accurate. My slot-specific code is entirely in SaveInHashTable() and LookupInHashTable().
SaveInHashTable() does this:
Search up to 4 consecutive slots for a hit. If no hit, use the slot with the smallest draft (a never-written-to slot has draft 0).
Then overwrite the slot unless (incoming_score_is_inexact && we_have_a_hit && draft_of_slot > depth_of_incoming_node).
That's very interesting; I've never heard of the equidistributed draft method. I assume it tries to keep a mix of deep and shallow entries. What's the actual replacement rule it uses, if you don't mind my asking? Is this the method you use in Joker and your new engine?
I use it in Joker. It keeps a histogram of drafts present in the depth-preferred part of the table, (which in Joker is d=1; d=0 entries only go to the always-replace part), and when a certain draft is 'over-represented', i.e. more frequent than the lowest draft that can be stored in the depth-preferred part, entries of that draft are replaced irrespective of the draft of the new entry.
This tends to allocate equal numbers of entries for all drafts, as the larger drafts have the tendency to accumulate in the depth-preferred part, and quickly push out the lowest draft. But as soon as the latter gets too low, it provides negative feedback to this process, as it means that the higher drafts suddenly become fair game for replacement. And in general, these are then replaced by lower drafts, as these are more abundant in the tree.
The idea behind this is that transpositions occur with ever larger time intervals in between. If I have 4 moves that can be transposed, there would be 24 permutations. If you start with ABCD, you would quite quickly encounter ABDC (when the first level below you with the same stm changes from C to D), then you would have to wait very much longer for ACDB (ACBD would have been intercepted by a TT hit after ACB), and still much longer for the root move to change and have BCDA. So if space is really cramped, and you have many more positions of a certain level in your tree than there are entries, you make best use of the entries by keeping the most-recently written, as they have the largest chance per unit of time to be accessed again throug a transposition with a move close to the leaves.
On the other hand, deeper entries save you more work on a hit. But the ratio of the work saved is inversely proportional to the frequency with which new nodes of that draft are generated: if d=1 nodes are 5 times more abundant than d=2 nodes, it must mean that the average d=2 search tree contains 5 d=1 nodes, and thus 5 d=1 sub-trees. You can afford to wait N times longer for a hit on a TT entry that saves you N times as much work, before the entry starts to be used less efficietly. If you keep around things for an amount of time inversely proportional to their rate of generation, you end up with equal populations.
One can combine the idea of replacing over-represented drafts with any other replacement scheme. As over-represented entries will be replaced by the first thing that comes along, which is usually a low draft, you will have a fair amount of low drafts even in the depth-preferred part, and thus you won't need a very large always-replace part. In Joker I happen to have 7 entries per cache line, and I use one of those as always-replace entry, for anything that is rejected from the depth-preferred part. I probe only two depth-preferred entries, and if I cannot replace there (the primary one was not over-represented, and both had higher draft than the new position) it goes to the overflow slot.
It could also be combined with shallowes-of-two or shallowest-of-four replacement, however. Just first look if the primary entry (where you would look first) is over-represented. If so, replace it, if not, do the usual thing. But I switched to considereing only 2 entries, because when you do not unconditionally keep the higher drafts around anyway, there was little payoff in going though a lot of trouble to not lose them occasionally on a collision between two high draft positions. So I just save some time on comparisons by comparing only two (plus a third for the always-replace slot). Earlier tests had shown that shallowes-of-seven (the most I could do in the given cache line) was not competative to shallowest-of-four anyway, so I would compare with a part of the entries in the cache line anyway, and the step to making that only two was a small one.
Just another thought...when looking at hash table changes in Tinker, often the search tree changes enough that the move and score change as well,
even for fixed depth searches with the same replacement scheme and number of slots and changing only the table size.
Changing the PV move of course can dramatically change the search time.
So, just wondering if the data above might refelect different PVs depending on the hash size.
Incidentally, I'm not sure why this happens, but Crafty does it also.
Indeed, search-time data is extermely noisy, and even bigger tables or better replacement algorithms can double the search time, occasionally. For evaluating a replacement scheme, a good way is to average out the noise by searching many positions, or retry searching the same position many times with different Zobrist randoms.