multi thread
Moderator: Ras
-
- Posts: 1922
- Joined: Thu Mar 09, 2006 12:51 am
- Location: Earth
Re: multi thread
Is it? I don't think so. Just for simplicity, assume 1 entry per cache line (having more entries per line could only hurt anyway). So we probe, and there's a cache miss due to the cache line being invalid. Since it had to be in the cache in the first place for this to happen, that means that the last thing we did was store an entry into it (or zero it, which is irrelevant here). Now there are two cases:Gian-Carlo Pascutto wrote:If you think about this more, it should be obvious that the case in which the cache coherency is going to penalize you, is exactly the case where you really want a shared table. (A miss followed by a store)So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters.
1. We are looking up the same position that we stored last time. The other CPU that overwrote the entry read the entry that we stored, failed the hash key check, and then it stored the results for a different position once it is done. Technically the other processor could be storing the same position if we stored the entry after it probed, but this case will be pretty rare, and it hurts anyways. So the entry we needed got overwritten with a useless one.
2. We are looking up a different position. There are two sub-cases:
2a. The other CPU overwrote the entry with the position we want
2b. The other CPU overwrote the entry with a third (or possibly the first due to a race condition)
1 is obviously bad. 2b is bad too, but probably 2a happens more, and that is the only case that helps. I'd say that 1 is much more common than 2 though.
Bob does have a good point about shared L3, where each cpu increases the cache footprint. But depending on the size of the table, it's not clear to me that dividing the size of the table up (or even making it shared between a subset of the processors) wouldn't be better.
-
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: multi thread
I have used globally shared pawn hash and eval hashtables for some time. When I switched to allocating separate tables, it made a big difference on NPS scaling. I always wondered why I coudn't get nps scaling close to 4 on quads (It was 3.2 max). After that I tested again by allocating 2x bigger pawn/eval cache sizes but still the nps scaling was not as good. Also if you share them , you can have collisions which could break things if you dont care to lock them or do sanity check. Lesson learned, don't share anything except the main transposition table.
Daniel
Daniel
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: multi thread
bah. didn't see that page 2 at all
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: multi thread
This can be solved. I always check to see if the current pawn hash signature is the same as the signature the last time I did a probe. If so I use the copy of that entry I cleverly saved in thread-local memory and don't go back to the table itself where the entry might well be gone by now... A great majority of the positions are reached using this trick since most moves are not pawn moves.Zach Wegner wrote:Is it? I don't think so. Just for simplicity, assume 1 entry per cache line (having more entries per line could only hurt anyway). So we probe, and there's a cache miss due to the cache line being invalid. Since it had to be in the cache in the first place for this to happen, that means that the last thing we did was store an entry into it (or zero it, which is irrelevant here). Now there are two cases:Gian-Carlo Pascutto wrote:If you think about this more, it should be obvious that the case in which the cache coherency is going to penalize you, is exactly the case where you really want a shared table. (A miss followed by a store)So the locality for pawn hash could probably go one way or the other, but the cache coherency still matters.
1. We are looking up the same position that we stored last time. The other CPU that overwrote the entry read the entry that we stored, failed the hash key check, and then it stored the results for a different position once it is done. Technically the other processor could be storing the same position if we stored the entry after it probed, but this case will be pretty rare, and it hurts anyways. So the entry we needed got overwritten with a useless one.
One big benefit however. A cache miss does not mean a memory access. It may well mean (in your last two cases above) that the data gets forwarded from the cache with the good copy, which is way faster than a memory access.
2. We are looking up a different position. There are two sub-cases:
2a. The other CPU overwrote the entry with the position we want
2b. The other CPU overwrote the entry with a third (or possibly the first due to a race condition)
1 is obviously bad. 2b is bad too, but probably 2a happens more, and that is the only case that helps. I'd say that 1 is much more common than 2 though.
Bob does have a good point about shared L3, where each cpu increases the cache footprint. But depending on the size of the table, it's not clear to me that dividing the size of the table up (or even making it shared between a subset of the processors) wouldn't be better.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: multi thread
I have a shared pawn hash table and it has no effect on my NPS scaling whatsoever, which has always been near-optimal. For 8-core boxes it is about as good as it can be. Don't see how pawn hash would cause a NPS scaling issue unless you are using bits and pieces of the hash entry throughout your evaluation, in which case you should make a local copy of the entire entry anyway.Daniel Shawul wrote:I have used globally shared pawn hash and eval hashtables for some time. When I switched to allocating separate tables, it made a big difference on NPS scaling. I always wondered why I coudn't get nps scaling close to 4 on quads (It was 3.2 max). After that I tested again by allocating 2x bigger pawn/eval cache sizes but still the nps scaling was not as good. Also if you share them , you can have collisions which could break things if you dont care to lock them or do sanity check. Lesson learned, don't share anything except the main transposition table.
Daniel
-
- Posts: 4186
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: multi thread
I keep local copies of hash table entries after a successful probe. There is one pawn_record_entry local to each thread ,which it uses to a keep copy. In the case of eval table, I have copies for each trade at each ply so that I can compare effect of moves on evaluations. These copies are not shared between the searchers. A mistake that I made when comparing NPS scaling was to forget allocating 2x larger tables for dual core speedup test, 4x for quad etc... But increasing the table sizes while sharing them was not as effective as un-sharing them and allocate tables separately.
Redoing the test with shared pawn table did not affect the nps scaling much (same result as you). But sharing the eval cache did impact it a lot. I have this result but can only guess for the causes
a) lower hit rate for eval tt compared to pawn tt
b) the differce storage structure for pawn/eval copies i mentioned above
I can post numbers after i get back my computer in a few days.
I think that to not share is also better for cluster computing, NUMA etc
Redoing the test with shared pawn table did not affect the nps scaling much (same result as you). But sharing the eval cache did impact it a lot. I have this result but can only guess for the causes
a) lower hit rate for eval tt compared to pawn tt
b) the differce storage structure for pawn/eval copies i mentioned above
I can post numbers after i get back my computer in a few days.
I think that to not share is also better for cluster computing, NUMA etc
Last edited by Daniel Shawul on Thu Jun 04, 2009 2:46 am, edited 1 time in total.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: multi thread
??wgarvin wrote:bah. didn't see that page 2 at all
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: multi thread
I posted the direct link to your lockless hashing page, not noticing that he had already found it and replied.bob wrote:??wgarvin wrote:bah. didn't see that page 2 at all

On the plus side, it led to me wandering through the DTS page for the first time in a couple of years. That page is always interesting to read (though I am in over my head when I try to understand the little details).
-
- Posts: 1260
- Joined: Sat Dec 13, 2008 7:00 pm
Re: multi thread
Only if you're using tiny tables because they're CPU-local...with a large table and hit rates over 90%, the typical case should be 2, no?Zach Wegner wrote: 1. We are looking up the same position that we stored last time. The other CPU that overwrote the entry read the entry that we stored, failed the hash key check, and then it stored the results for a different position once it is done. Technically the other processor could be storing the same position if we stored the entry after it probed, but this case will be pretty rare, and it hurts anyways. So the entry we needed got overwritten with a useless one.
2. We are looking up a different position. There are two sub-cases:
2a. The other CPU overwrote the entry with the position we want
2b. The other CPU overwrote the entry with a third (or possibly the first due to a race condition)
1 is obviously bad. 2b is bad too, but probably 2a happens more, and that is the only case that helps. I'd say that 1 is much more common than 2 though.