The unreasonable effectiveness of TT slots

Discussion of chess software programming and technical issues.

Moderator: Ras

micron
Posts: 155
Joined: Mon Feb 15, 2010 9:33 am
Location: New Zealand

The unreasonable effectiveness of TT slots

Post by micron »

I just got around to implementing multiple slots in Spandrel's hash table. Here's a comparison of 1-slot with 4-slots.

Code: Select all

   Times in seconds for search from the starting position
          11-ply search           12-ply search
 MB     1 slot    4 slots       1 slot    4 slots 
  2      21.1       10.2        115.3      114.4
  4      18.1        6.0        107.3       86.3
  8      14.0        6.3        108.0       75.1
 16      13.7        5.4         96.4       27.6
 32       7.5        5.4         84.0       18.2
 64       9.2        5.4         79.7       17.8
128       6.8        5.5         60.3       17.7
256       6.5        5.5         54.9       17.7
512       5.6        5.6         35.3       18.0  
1024      5.7        5.7         34.0       17.9  
2048      5.8        5.9         18.5       18.2
TT size is given in MB, and each element is 16 bytes. Evidently having multiple slots allows you to get by with a substantially smaller table, perhaps 1/8 the size.
It's common knowledge that multiple slots are better than one. My question is: why are they so much better?

Robert P.
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: The unreasonable effectiveness of TT slots

Post by Gian-Carlo Pascutto »

You won't end up overwriting (the few) very deep entries with shallow ones (of which there are shitloads).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The unreasonable effectiveness of TT slots

Post by bob »

micron wrote:I just got around to implementing multiple slots in Spandrel's hash table. Here's a comparison of 1-slot with 4-slots.

Code: Select all

   Times in seconds for search from the starting position
          11-ply search           12-ply search
 MB     1 slot    4 slots       1 slot    4 slots 
  2      21.1       10.2        115.3      114.4
  4      18.1        6.0        107.3       86.3
  8      14.0        6.3        108.0       75.1
 16      13.7        5.4         96.4       27.6
 32       7.5        5.4         84.0       18.2
 64       9.2        5.4         79.7       17.8
128       6.8        5.5         60.3       17.7
256       6.5        5.5         54.9       17.7
512       5.6        5.6         35.3       18.0  
1024      5.7        5.7         34.0       17.9  
2048      5.8        5.9         18.5       18.2
TT size is given in MB, and each element is 16 bytes. Evidently having multiple slots allows you to get by with a substantially smaller table, perhaps 1/8 the size.
It's common knowledge that multiple slots are better than one. My question is: why are they so much better?

Robert P.
Simple... better replacement so that you don't overwrite things that save more effort with things that barely save anything. Whenever you can exercise a choice, things are better.
brianr
Posts: 540
Joined: Thu Mar 09, 2006 3:01 pm
Full name: Brian Richardson

Re: The unreasonable effectiveness of TT slots

Post by brianr »

OK, this might be a silly question, but here goes:

For a given TT size in your table above, say 1024, does the 4 slot version have only one quarter of the number of entries as the 1 slot? The time differences seem a bit too large to me, so I thought the 4 slot might be 4x larger by mistake.

Also, what replacement algorithim did you use?

Thanks,
Brian
micron
Posts: 155
Joined: Mon Feb 15, 2010 9:33 am
Location: New Zealand

Re: The unreasonable effectiveness of TT slots

Post by micron »

brianr wrote:For a given TT size in your table above, say 1024, does the 4 slot version have only one quarter of the number of entries as the 1 slot? The time differences seem a bit too large to me, so I thought the 4 slot might be 4x larger by mistake.

Also, what replacement algorithim did you use?
The table sizes are accurate. My slot-specific code is entirely in SaveInHashTable() and LookupInHashTable().

SaveInHashTable() does this:
Search up to 4 consecutive slots for a hit. If no hit, use the slot with the smallest draft (a never-written-to slot has draft 0).
Then overwrite the slot unless (incoming_score_is_inexact && we_have_a_hit && draft_of_slot > depth_of_incoming_node).
User avatar
hgm
Posts: 28451
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The unreasonable effectiveness of TT slots

Post by hgm »

Note that for very heavily overloaded tables I found some algorithms that seemed to perform better than that:

Image
Aaron Becker
Posts: 292
Joined: Tue Jul 07, 2009 4:56 am

Re: The unreasonable effectiveness of TT slots

Post by Aaron Becker »

That's very interesting; I've never heard of the equidistributed draft method. I assume it tries to keep a mix of deep and shallow entries. What's the actual replacement rule it uses, if you don't mind my asking? Is this the method you use in Joker and your new engine?
User avatar
hgm
Posts: 28451
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The unreasonable effectiveness of TT slots

Post by hgm »

I use it in Joker. It keeps a histogram of drafts present in the depth-preferred part of the table, (which in Joker is d=1; d=0 entries only go to the always-replace part), and when a certain draft is 'over-represented', i.e. more frequent than the lowest draft that can be stored in the depth-preferred part, entries of that draft are replaced irrespective of the draft of the new entry.

This tends to allocate equal numbers of entries for all drafts, as the larger drafts have the tendency to accumulate in the depth-preferred part, and quickly push out the lowest draft. But as soon as the latter gets too low, it provides negative feedback to this process, as it means that the higher drafts suddenly become fair game for replacement. And in general, these are then replaced by lower drafts, as these are more abundant in the tree.

The idea behind this is that transpositions occur with ever larger time intervals in between. If I have 4 moves that can be transposed, there would be 24 permutations. If you start with ABCD, you would quite quickly encounter ABDC (when the first level below you with the same stm changes from C to D), then you would have to wait very much longer for ACDB (ACBD would have been intercepted by a TT hit after ACB), and still much longer for the root move to change and have BCDA. So if space is really cramped, and you have many more positions of a certain level in your tree than there are entries, you make best use of the entries by keeping the most-recently written, as they have the largest chance per unit of time to be accessed again throug a transposition with a move close to the leaves.

On the other hand, deeper entries save you more work on a hit. But the ratio of the work saved is inversely proportional to the frequency with which new nodes of that draft are generated: if d=1 nodes are 5 times more abundant than d=2 nodes, it must mean that the average d=2 search tree contains 5 d=1 nodes, and thus 5 d=1 sub-trees. You can afford to wait N times longer for a hit on a TT entry that saves you N times as much work, before the entry starts to be used less efficietly. If you keep around things for an amount of time inversely proportional to their rate of generation, you end up with equal populations.

One can combine the idea of replacing over-represented drafts with any other replacement scheme. As over-represented entries will be replaced by the first thing that comes along, which is usually a low draft, you will have a fair amount of low drafts even in the depth-preferred part, and thus you won't need a very large always-replace part. In Joker I happen to have 7 entries per cache line, and I use one of those as always-replace entry, for anything that is rejected from the depth-preferred part. I probe only two depth-preferred entries, and if I cannot replace there (the primary one was not over-represented, and both had higher draft than the new position) it goes to the overflow slot.

It could also be combined with shallowes-of-two or shallowest-of-four replacement, however. Just first look if the primary entry (where you would look first) is over-represented. If so, replace it, if not, do the usual thing. But I switched to considereing only 2 entries, because when you do not unconditionally keep the higher drafts around anyway, there was little payoff in going though a lot of trouble to not lose them occasionally on a collision between two high draft positions. So I just save some time on comparisons by comparing only two (plus a third for the always-replace slot). Earlier tests had shown that shallowes-of-seven (the most I could do in the given cache line) was not competative to shallowest-of-four anyway, so I would compare with a part of the entries in the cache line anyway, and the step to making that only two was a small one.
brianr
Posts: 540
Joined: Thu Mar 09, 2006 3:01 pm
Full name: Brian Richardson

Re: The unreasonable effectiveness of TT slots

Post by brianr »

Just another thought...when looking at hash table changes in Tinker, often the search tree changes enough that the move and score change as well,
even for fixed depth searches with the same replacement scheme and number of slots and changing only the table size.

Changing the PV move of course can dramatically change the search time.
So, just wondering if the data above might refelect different PVs depending on the hash size.

Incidentally, I'm not sure why this happens, but Crafty does it also.

Brian
User avatar
hgm
Posts: 28451
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: The unreasonable effectiveness of TT slots

Post by hgm »

Indeed, search-time data is extermely noisy, and even bigger tables or better replacement algorithms can double the search time, occasionally. For evaluating a replacement scheme, a good way is to average out the noise by searching many positions, or retry searching the same position many times with different Zobrist randoms.