average hit rate of pawn hash tables?

bob · Post by **bob** » Fri Feb 15, 2008 3:06 am

Zach Wegner wrote:Well, I don't use any bucket system for the pawn hash. And anyways, not even one of my entries will fit in a cache line, and I don't expect many people's to fit either.

Perhaps if you did make a bucket system, and keep track of the "age", it would be marginally better. I don't think it would make much difference, especially considering the overhead in cache thrashing.

What would you use for choosing a replacement? There is no "draft", since the position has nothing to do with depth. I've always used always-store and do not do multiple probes. Since I get 99% all the time, anything more complex is going to be a losing proposition since I can at most gain 1% or less of the savings of doing a full pawn evaluation, and the cost for doing multiple probes will likely make the idea more expensive with no advantage to offset the cost.

Aleks Peshkov · Post by **Aleks Peshkov** » Fri Feb 15, 2008 11:06 am

IMHO 99% hit rate is too much. Making pawn hash size significantly smaller we can hope to gain more overall speed up due more level 2 CPU cache hits.

hgm · Post by **hgm** » Fri Feb 15, 2008 12:02 pm

bob wrote:Since I get 99% all the time, anything more complex is going to be a losing proposition since I can at most gain 1% or less of the savings of doing a full pawn evaluation, and the cost for doing multiple probes will likely make the idea more expensive with no advantage to offset the cost.

Aleks correctly refutes this argument: with a clever replacement scheme you might be able to achieve 99% hit rate with a far smaller table, making the probes faster through more efficient L2 cache use. Despite the larger number of probes. One cache miss costs as much as >100 hits. And the secondary probes are guaranteed hits. So an occasional extra hit on a primary probe because of the smaller table will pay for more than 100 extra secondary probes. If you have a 1MB cache and a 100MB hash table, the cache hit rate on totally random accesses is already 1%.

wgarvin · Post by **wgarvin** » Fri Feb 15, 2008 3:48 pm

hgm wrote:
bob wrote:Since I get 99% all the time, anything more complex is going to be a losing proposition since I can at most gain 1% or less of the savings of doing a full pawn evaluation, and the cost for doing multiple probes will likely make the idea more expensive with no advantage to offset the cost.
Aleks correctly refutes this argument: with a clever replacement scheme you might be able to achieve 99% hit rate with a far smaller table, making the probes faster through more efficient L2 cache use. Despite the larger number of probes. One cache miss costs as much as >100 hits. And the secondary probes are guaranteed hits. So an occasional extra hit on a primary probe because of the smaller table will pay for more than 100 extra secondary probes. If you have a 1MB cache and a 100MB hash table, the cache hit rate on totally random accesses is already 1%.

How big is a typical pawn hash table? Are they really as big as 100MB ? I thought it would be more like 1MB.

bob · Post by **bob** » Fri Feb 15, 2008 4:14 pm

Aleks Peshkov wrote:IMHO 99% hit rate is too much. Making pawn hash size significantly smaller we can hope to gain more overall speed up due more level 2 CPU cache hits.

Here's a test of your hypothesis, made by just varying the phash size and measuring the time required to reach a specific depth:

Code: Select all


time=49.15  mat=0  n=95985335  fh=94%  nps=2.0M hash hit rate &#40;256&#41; = 85.10
time=47.37  mat=0  n=95985335  fh=94%  nps=2.0M hash hit rate &#40;512&#41; = 90.81
time=47.28  mat=0  n=95985335  fh=94%  nps=2.0M hash hit rate &#40;1K&#41; = 92.49
time=46.84  mat=0  n=95985335  fh=94%  nps=2.0M hash hit rate &#40;2K&#41; = 93.97
time=46.08  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;4K&#41; = 95.52
time=46.05  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;8K&#41; = 96.67
time=45.48  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;16K&#41; = 97.43
time=45.29  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;32K&#41; = 98.00
time=45.48  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;64K&#41; = 98.46
time=45.35  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;128K&#41; = 98.82
time=44.78  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;256K&#41; = 99.03
time=44.64  mat=0  n=95985335  fh=94%  nps=2.2M hash hit rate &#40;512K&#41; = 99.18
time=44.93  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;1.0M&#41; = 99.26
time=45.14  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;2.0M&#41; = 99.29
time=45.42  mat=0  n=95985335  fh=94%  nps=2.1M hash hit rate &#40;4.0M&#41; = 99.31

So a hit rate of 99% is not hurting the overall NPS or search time. This was run on a position searched to depth=18 (middlegame position). The number in parentheses after hit rate is the number of paw n hash entries, not the size of the table in bytes. 512K seems to be optimal, producing a 99.2% hit rate and the shortest search time as well...

bob · Post by **bob** » Fri Feb 15, 2008 4:22 pm

hgm wrote:
bob wrote:Since I get 99% all the time, anything more complex is going to be a losing proposition since I can at most gain 1% or less of the savings of doing a full pawn evaluation, and the cost for doing multiple probes will likely make the idea more expensive with no advantage to offset the cost.
Aleks correctly refutes this argument: with a clever replacement scheme you might be able to achieve 99% hit rate with a far smaller table, making the probes faster through more efficient L2 cache use. Despite the larger number of probes. One cache miss costs as much as >100 hits. And the secondary probes are guaranteed hits. So an occasional extra hit on a primary probe because of the smaller table will pay for more than 100 extra secondary probes. If you have a 1MB cache and a 100MB hash table, the cache hit rate on totally random accesses is already 1%.

Please define "with a clever replacement scheme". He hasn't "refuted" the argument at all. If you look at the results I just posted, running the table size up to whatever it takes for max hit rate speeds the search until you pass the point where more entries does not improve the hit rate any more, then it just hurts L1/L2 cache a bit, although the effect is very small.

Nothing at all was "refuted" however...

BTW secondary probes are _not_ guaranteed hits. Hash entries are, by their nature, long. Crafty's pawn hash entry is 36 bytes. So probing two entries at once will result in _twice_ the number of cache line fills, since two won't fit in a single line. That is guaranteed to hurt, not help. And that still leaves the question of how to do an "intelligent replacement strategy" when there is nothing that says entry X will be more valuable than entry Y, except for a crystal ball or palm reader...

Feel free to refute this, but please use actual results, not guesswork based on more guesswork. It is easy enough to test as I did. Took all of 10 minutes or so...

bob · Post by **bob** » Fri Feb 15, 2008 4:37 pm

wgarvin wrote:
hgm wrote:
bob wrote:Since I get 99% all the time, anything more complex is going to be a losing proposition since I can at most gain 1% or less of the savings of doing a full pawn evaluation, and the cost for doing multiple probes will likely make the idea more expensive with no advantage to offset the cost.
Aleks correctly refutes this argument: with a clever replacement scheme you might be able to achieve 99% hit rate with a far smaller table, making the probes faster through more efficient L2 cache use. Despite the larger number of probes. One cache miss costs as much as >100 hits. And the secondary probes are guaranteed hits. So an occasional extra hit on a primary probe because of the smaller table will pay for more than 100 extra secondary probes. If you have a 1MB cache and a 100MB hash table, the cache hit rate on totally random accesses is already 1%.
How big is a typical pawn hash table? Are they really as big as 100MB ? I thought it would be more like 1MB.

I run with 100mb all the time.

But if we are going to discuss this, we ought to use real numbers, not the numbers HGM is quoting which are wrong.

Pick your processor of choice, and you can find the L1 cache line fill time and the L2 cache line fill time. Memory access time is more complex and the numbers that were given are wrong. A typical _random_ memory access is going to take around 450ns to pull off on a 32 bit processor, longer on a 64 bit processor, because the first given is a TLB miss when converting the virtual address to a physical address. That is two memory references to translate the address, plus a third reference to actually fetch the data.

So the latency for a miss is significant. But what everyone is missing is this: The same entry is accessed frequently. How do I know? Just look at how Crafty hashes. If the current hash signature == last hash signature, I don't have anything to do, and it happens often:

Code: Select all


                                    Bxa5
              time=44.88  mat=0  n=95985335  fh=94%  nps=2.1M
              ext-> check=1.3M 1rep=63K mate=3K pp=0 reduce=52.2M/8.2M
              predicted=0  evals=81.5M  50move=0  EGTBprobes=0  hits=0
              SMP->  splits=0  aborts=0  data=0/128  elap=44.88
hit rate = 69.40

What that says is that 69.40% of _ALL_ pawn positions evaluated were the same as the previous position. Or, in other words, 30% of pawn evaluations require a random probe into the pawn hash table, 70% re-use the previous entry without a probe of any kind...

Again, real numbers, not guesswork.

So the cache miss penalty is far higher than HGM's number. But the number of those misses is far lower than the speculated value as well.

I sill don't see why everyone doesn't just do as I do and run a test, rather than resorting to guesswork. This change to measure this in crafty took 30 seconds including the compile time...

The only rule I would use, and this one is based on actual testing and not on guesswork, is to make pawn hash big enough to give you 99% hit rates, going beyond this is not particularly helpful although it also does not hurt very much at all. The drawback is that beyond some limit, you have to start shrinking something else (egtb cache, regular transposition/refutation table size, etc) which might have a negative effect...

Just remember the above test, 70% of the time, the pawn structure is _identical_ with the previously evaluated pawn structure, so that not even a pawn hash probe is needed. So all we can improve (or hurt) is the remaining 30%, which greatly limits the effect all the speculation is addressing...

hgm · Post by **hgm** » Sat Feb 16, 2008 7:37 pm

As for 'real number': a random DRAM access takes 385 clocks on my 1.3GHz Pentium M. An L2 access take 6 clocks. That is even smaller than 100. But most people nowadays have clock speeds far larger than 1.3GHz, which speeds up the L2, but not the DRAM. So 100 seems a more reasonable factor.

So I don't know where you get the notion that the penalty is 'far greater'...

How often the same entry is accessed in a row doesn't seem relevant. If it is just accessed it wil be in L1, and the acces is practically free. The issue is how to most efficiently bring the data in L2. The reasoning "I have 99% hit rate, so I can gain at most 1% from smarter replacement" simply is not valid. If, because of the smarter replacement, and resulting smaller table size for the same hit rate, your average probe time is cut in half, you gain 50% on a hash hit-rate of 99% by usibng that smarter replacement!

Note that my statement about the secondary probes being guaranteed hit pertained to the case we were discussing:

Zach Wegner wrote:not even one of my entries will fit in a cache line, and I don't expect many people's to fit either.

So your 'counter-example' of 36-byte entries with a 64-byte cache line is simply not relevant.

It makes me curious, though, how you implement that. Do you align them with cache-line boundaries, sacrificing 44% of the space? Or are you better off packing them as dense as possible, cramming in more entries, but having half of them span a line boundary and need two line fills?

average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?

Re: average hit rate of pawn hash tables?