multi thread

MDiaz · Post by **MDiaz** » Wed Jun 03, 2009 7:43 pm

Im doing first atempts to prepar my engine to multi thread

I asume that i ned:

list of moes (local for the thread)
list of killers local

hash table global

but I doubt it if history table and pawn hash / eval hash are beter local or global

Can you help me?

bob · Post by **bob** » Wed Jun 03, 2009 8:16 pm

MDiaz wrote:Im doing first atempts to prepar my engine to multi thread

I asume that i ned:

list of moes (local for the thread)
list of killers local

hash table global

but I doubt it if history table and pawn hash / eval hash are beter local or global

Can you help me?

All of that should be global. But for any tree state stuff, such as piece locations, board position, killer moves, PV, move lists, etc, you have to maintain that in a separate memory region for each thread...

Zach Wegner · Post by **Zach Wegner** » Wed Jun 03, 2009 8:51 pm

bob wrote:All of that should be global. But for any tree state stuff, such as piece locations, board position, killer moves, PV, move lists, etc, you have to maintain that in a separate memory region for each thread...

Really? I think for at least evaluation hash and pawn hash, it would be better to be local. Their only point is a linear time savings, it can't affect the search tree (which would be exponential). And a relatively small one at that. They are also both localized. So if you have many processors writing constantly to them, not only are they going to fill it with a bunch of junk that wouldn't be useful to the other processors, you ALSO reduce the cache hit rate substantially.

For a qsearch hash, it's debatable. Mine is local, and should be small enough to fit entirely in cache. But I could see people getting use out of a global one. IIRC Vincent said his is global.

bob · Post by **bob** » Wed Jun 03, 2009 8:54 pm

Zach Wegner wrote:
bob wrote:All of that should be global. But for any tree state stuff, such as piece locations, board position, killer moves, PV, move lists, etc, you have to maintain that in a separate memory region for each thread...
Really? I think for at least evaluation hash and pawn hash, it would be better to be local. Their only point is a linear time savings, it can't affect the search tree (which would be exponential). And a relatively small one at that. They are also both localized. So if you have many processors writing constantly to them, not only are they going to fill it with a bunch of junk that wouldn't be useful to the other processors, you ALSO reduce the cache hit rate substantially.

For a qsearch hash, it's debatable. Mine is local, and should be small enough to fit entirely in cache. But I could see people getting use out of a global one. IIRC Vincent said his is global.

My pawn hash has a hit rate of over 99%. That says "global" and not "local". My regular hash is also global for the same reason, it is a "global" deal that is used throughout the tree.

Zach Wegner · Post by **Zach Wegner** » Wed Jun 03, 2009 9:00 pm

bob wrote:My pawn hash has a hit rate of over 99%. That says "global" and not "local". My regular hash is also global for the same reason, it is a "global" deal that is used throughout the tree.

OK, but you're going to get the same hit rate with one processor. So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters. My pawn hash is pretty small anyways, which makes both factors lean towards per-cpu (I would think at least).

The locality issue was primarily in reference to the evaluation hash. Hit rates there are 20-40%.

Gian-Carlo Pascutto · Post by **Gian-Carlo Pascutto** » Wed Jun 03, 2009 9:08 pm

So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters.

If you think about this more, it should be obvious that the case in which the cache coherency is going to penalize you, is exactly the case where you really want a shared table. (A miss followed by a store)

However, I would be worried about lock contention. You can ignore locking in the ttable, but for a pawnhash, a collision would have a much higher chance of hurting you, exactly because of the high hitrates.

Does Crafty lock the pawnhash?

bob · Post by **bob** » Wed Jun 03, 2009 9:10 pm

Zach Wegner wrote:
bob wrote:My pawn hash has a hit rate of over 99%. That says "global" and not "local". My regular hash is also global for the same reason, it is a "global" deal that is used throughout the tree.
OK, but you're going to get the same hit rate with one processor. So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters. My pawn hash is pretty small anyways, which makes both factors lean towards per-cpu (I would think at least).

The locality issue was primarily in reference to the evaluation hash. Hit rates there are 20-40%.

I don't get burned by the coherency issue much. First, either the entire entry gets overwritten, or nothing is modified. An entry is pretty big at 32 bytes, so I get 2 per cache block, which is minimal interference. Also, I am a bit tricky here as well, in that when I do a probe and get a hit, I move that entry to local memory and so long as the pawn positions don't change, I continue to use that local entry and never go back and probe again.

On an 8 core system, the hit rate would be 8x worse since each core would calculate things that might have already been calculated on other cores, which doesn't seem like a good idea to me. Even worse, on a nehalem, with one L3 cache, you have 8 copies of the same information on an 8-core system, since each thread would be using local data rather than shared. Again that influences cache footprint.

My thinking is to try to minimize sharding read/write data across all threads, at least where you have a single piece of read/write data in the same block with a bunch of read-only variables. But beyond that, I leave it to the cache controllers to work things out, while I try to minimize duplicated data. That's a reason I prefer threads over processes, as with EGTBs, eugene's code is threaded, but with processes you get egtb cache duplicated as well, which greatly increases the total I/O rate and memory usage instead of using his LRU cache scheme which tries to reduce that...

bob · Post by **bob** » Wed Jun 03, 2009 9:11 pm

Gian-Carlo Pascutto wrote:
So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters.
If you think about this more, it should be obvious that the case in which the cache coherency is going to penalize you, is exactly the case where you really want a shared table. (A miss followed by a store)

However, I would be worried about lock contention. You can ignore locking in the ttable, but for a pawnhash, a collision would have a much higher chance of hurting you, exactly because of the high hitrates.

Does Crafty lock the pawnhash?

It uses the "lockless" (XOR) approach, yes...

frankp · Post by **frankp** » Wed Jun 03, 2009 9:36 pm

bob wrote:
It uses the "lockless" (XOR) approach, yes...

Is there a reference to this approach? Hash locks are killing me.

bob · Post by **bob** » Wed Jun 03, 2009 9:43 pm

frankp wrote:
bob wrote:
It uses the "lockless" (XOR) approach, yes...
Is there a reference to this approach? Hash locks are killing me.

Yes. Same idea used in Crafty's main hash table. I believe there is an electronic copy on my web page at www.cis.uab.edu/faculty

multi thread

multi thread

Re: multi thread

Re: multi thread

Re: multi thread

Re: multi thread

Re: multi thread

Re: multi thread

Re: multi thread

Re: multi thread

Re: multi thread