ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

info about zappa on 512 cores ?
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Flat
View previous topic :: View next topic  
Author Message
Robert Hyatt



Joined: 27 Feb 2006
Posts: 15816
Location: Birmingham, AL

PostPost subject: Re: numa scaling    Posted: Mon Jun 25, 2012 10:05 am Reply to topic Reply with quote

Daniel Shawul wrote:
Quote:

Your "infiniband" says it all. It is really not a 'Numa" box in the sense I was using. IE AMD using their hypertransport bus as opposed to using infiniband. I would agree that a NUMA box using infiniband has a significant latency issue that I have not touched on in Crafty at all.

Well crafty also displays it as a numa machine 8 x 4 ways so you have inconsistencies there. What is confusing is that it has no problems scaling to 8 processors when the opterons are mentioned as 4 core boxes. On another AMD machine with 24 cores (4 X 6 cores) same thing happens after using more than 12 threads. Again here one would expect the interconnect to be used after 6 cores but it is not... So the pattern seem to indicate two nodes are glued by something other than an infiniband (such as hypertransport bus).
Quote:

The only Numa-related thing I do in Crafty (which fits Intel/AMD Numa perfectly when you are talking about single motherboard NUMA boxes, is to make sure that the split blocks for a thread are first touched by the core that thread will run on, so that those virtual pages fault into physical pages that are local to that core. But Crafty is not designed at all for a machine with the high Numa latency infiniband introduces compared to direct bus connections...

I am also using the implicit memory allocation method to improve numa performance as suggested in the intel manual here. So I allocate pawn tt, eval tt and split blocks local to a thread and "touch" them first with the thread. I also tried a distributed shared main hash table. This is the suggested method in the manual and also Feldmann co used it even for clusters using message passing. But for numa machine I test on ,it seams division and modulo required for accessing section of the hash table slow down access time a lot. Thus for now I am using a global hash table bound to one node which performed a little better than other methods. From a quick look at crafty, I understand you do not distribute the tt for numa. There also seem to be a dead code for allocating interleaved memory that doesn't get used anywhere. It looks like libnuma is used only for displaying number of numa cores...


It's a NUMA box, so Crafty will say so. But it is a VERY POOR NUMA box compared to the typical single-MB AMD machines I have used in the past where you get 4-8 physical processor chips on a single MB, connected thru the hypertransport bus rather than remotely thru infiniband. The minute you go remote with infiniband, memory latency becomes huge compared to the usual bus architecture AMD/Intel uses, and Crafty has not even been remotely optimized for such an architecture. A simple spin lock is horrible there.

The "mallocinterleaved" is a windows mechanism that interleaves pages of a large memory region over the physical ram. Not for split blocks and the like, just for the hash memory stuff... Otherwise you end up with all the hash table on one node's local memory which creates a "hot-spot" that hurts performance. By interleaving pages, each physical processor's memory gets an equal amount of the hash table which spreads out accesses.
Back to top
View user's profile Send private message
Display posts from previous:   
Subject Author Date/Time
info about zappa on 512 cores ? Daniel Shawul Sat Jun 23, 2012 2:05 pm
      numa scaling Daniel Shawul Sun Jun 24, 2012 10:20 am
            Re: numa scaling Robert Hyatt Sun Jun 24, 2012 5:01 pm
                  Re: numa scaling Daniel Shawul Sun Jun 24, 2012 5:41 pm
                        Re: numa scaling Robert Hyatt Mon Jun 25, 2012 12:37 am
                              Re: numa scaling Daniel Shawul Mon Jun 25, 2012 1:23 am
                                    Re: numa scaling Robert Hyatt Mon Jun 25, 2012 10:05 am
                                          Re: numa scaling Daniel Shawul Mon Jun 25, 2012 4:00 pm
                                                Re: numa scaling Daniel Shawul Mon Jun 25, 2012 5:21 pm
                                                Re: numa scaling Daniel Shawul Mon Jun 25, 2012 8:09 pm
                                                      Re: numa scaling Robert Hyatt Mon Jun 25, 2012 8:38 pm
                                                            Re: numa scaling Daniel Shawul Mon Jun 25, 2012 8:53 pm
                                                                  Re: numa scaling Robert Hyatt Mon Jun 25, 2012 9:17 pm
                                                                        Re: numa scaling Daniel Shawul Mon Jun 25, 2012 9:46 pm
                                                                              Re: numa scaling Robert Hyatt Tue Jun 26, 2012 3:34 am
                                                                                    Re: numa scaling Daniel Shawul Tue Jun 26, 2012 11:06 am
                                                      Re: numa scaling Daniel Shawul Tue Jun 26, 2012 12:55 am
            transposition tables Daniel Shawul Sun Jun 24, 2012 5:53 pm
                  Re: transposition tables Daniel Shawul Wed Jun 27, 2012 11:17 pm
                        Re: transposition tables Ronald de Man Fri Jun 29, 2012 11:47 pm
                              Re: transposition tables Daniel Shawul Sat Jun 30, 2012 12:40 am
      Re: info about zappa on 512 cores ? Vincent Diepeveen Fri Jun 29, 2012 11:07 pm
            Re: info about zappa on 512 cores ? Daniel Shawul Sat Jun 30, 2012 12:32 am
                  Re: info about zappa on 512 cores ? Daniel Shawul Sat Jun 30, 2012 10:50 am
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads