multi thread

Discussion of chess software programming and technical issues.

Moderator: Ras

MDiaz

multi thread

Post by MDiaz »

Im doing first atempts to prepar my engine to multi thread

I asume that i ned:

list of moes (local for the thread)
list of killers local

hash table global

but I doubt it if history table and pawn hash / eval hash are beter local or global

Can you help me?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: multi thread

Post by bob »

MDiaz wrote:Im doing first atempts to prepar my engine to multi thread

I asume that i ned:

list of moes (local for the thread)
list of killers local

hash table global

but I doubt it if history table and pawn hash / eval hash are beter local or global

Can you help me?
All of that should be global. But for any tree state stuff, such as piece locations, board position, killer moves, PV, move lists, etc, you have to maintain that in a separate memory region for each thread...
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: multi thread

Post by Zach Wegner »

bob wrote:All of that should be global. But for any tree state stuff, such as piece locations, board position, killer moves, PV, move lists, etc, you have to maintain that in a separate memory region for each thread...
Really? I think for at least evaluation hash and pawn hash, it would be better to be local. Their only point is a linear time savings, it can't affect the search tree (which would be exponential). And a relatively small one at that. They are also both localized. So if you have many processors writing constantly to them, not only are they going to fill it with a bunch of junk that wouldn't be useful to the other processors, you ALSO reduce the cache hit rate substantially.

For a qsearch hash, it's debatable. Mine is local, and should be small enough to fit entirely in cache. But I could see people getting use out of a global one. IIRC Vincent said his is global.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: multi thread

Post by bob »

Zach Wegner wrote:
bob wrote:All of that should be global. But for any tree state stuff, such as piece locations, board position, killer moves, PV, move lists, etc, you have to maintain that in a separate memory region for each thread...
Really? I think for at least evaluation hash and pawn hash, it would be better to be local. Their only point is a linear time savings, it can't affect the search tree (which would be exponential). And a relatively small one at that. They are also both localized. So if you have many processors writing constantly to them, not only are they going to fill it with a bunch of junk that wouldn't be useful to the other processors, you ALSO reduce the cache hit rate substantially.

For a qsearch hash, it's debatable. Mine is local, and should be small enough to fit entirely in cache. But I could see people getting use out of a global one. IIRC Vincent said his is global.
My pawn hash has a hit rate of over 99%. That says "global" and not "local". My regular hash is also global for the same reason, it is a "global" deal that is used throughout the tree.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: multi thread

Post by Zach Wegner »

bob wrote:My pawn hash has a hit rate of over 99%. That says "global" and not "local". My regular hash is also global for the same reason, it is a "global" deal that is used throughout the tree.
OK, but you're going to get the same hit rate with one processor. So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters. My pawn hash is pretty small anyways, which makes both factors lean towards per-cpu (I would think at least).

The locality issue was primarily in reference to the evaluation hash. Hit rates there are 20-40%.
Gian-Carlo Pascutto
Posts: 1260
Joined: Sat Dec 13, 2008 7:00 pm

Re: multi thread

Post by Gian-Carlo Pascutto »

So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters.
If you think about this more, it should be obvious that the case in which the cache coherency is going to penalize you, is exactly the case where you really want a shared table. (A miss followed by a store)

However, I would be worried about lock contention. You can ignore locking in the ttable, but for a pawnhash, a collision would have a much higher chance of hurting you, exactly because of the high hitrates.

Does Crafty lock the pawnhash?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: multi thread

Post by bob »

Zach Wegner wrote:
bob wrote:My pawn hash has a hit rate of over 99%. That says "global" and not "local". My regular hash is also global for the same reason, it is a "global" deal that is used throughout the tree.
OK, but you're going to get the same hit rate with one processor. So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters. My pawn hash is pretty small anyways, which makes both factors lean towards per-cpu (I would think at least).

The locality issue was primarily in reference to the evaluation hash. Hit rates there are 20-40%.
I don't get burned by the coherency issue much. First, either the entire entry gets overwritten, or nothing is modified. An entry is pretty big at 32 bytes, so I get 2 per cache block, which is minimal interference. Also, I am a bit tricky here as well, in that when I do a probe and get a hit, I move that entry to local memory and so long as the pawn positions don't change, I continue to use that local entry and never go back and probe again.

On an 8 core system, the hit rate would be 8x worse since each core would calculate things that might have already been calculated on other cores, which doesn't seem like a good idea to me. Even worse, on a nehalem, with one L3 cache, you have 8 copies of the same information on an 8-core system, since each thread would be using local data rather than shared. Again that influences cache footprint.

My thinking is to try to minimize sharding read/write data across all threads, at least where you have a single piece of read/write data in the same block with a bunch of read-only variables. But beyond that, I leave it to the cache controllers to work things out, while I try to minimize duplicated data. That's a reason I prefer threads over processes, as with EGTBs, eugene's code is threaded, but with processes you get egtb cache duplicated as well, which greatly increases the total I/O rate and memory usage instead of using his LRU cache scheme which tries to reduce that...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: multi thread

Post by bob »

Gian-Carlo Pascutto wrote:
So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters.
If you think about this more, it should be obvious that the case in which the cache coherency is going to penalize you, is exactly the case where you really want a shared table. (A miss followed by a store)

However, I would be worried about lock contention. You can ignore locking in the ttable, but for a pawnhash, a collision would have a much higher chance of hurting you, exactly because of the high hitrates.

Does Crafty lock the pawnhash?
It uses the "lockless" (XOR) approach, yes...
frankp
Posts: 233
Joined: Sun Mar 12, 2006 3:11 pm

Re: multi thread

Post by frankp »

bob wrote:
It uses the "lockless" (XOR) approach, yes...
Is there a reference to this approach? Hash locks are killing me.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: multi thread

Post by bob »

frankp wrote:
bob wrote:
It uses the "lockless" (XOR) approach, yes...
Is there a reference to this approach? Hash locks are killing me.
Yes. Same idea used in Crafty's main hash table. I believe there is an electronic copy on my web page at www.cis.uab.edu/faculty