ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

nvidia tesla
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Flat
View previous topic :: View next topic  
Author Message
Vincent Diepeveen



Joined: 09 Mar 2006
Posts: 1738
Location: The Netherlands

PostPost subject: Re: nvidia tesla    Posted: Thu Apr 05, 2012 3:39 pm Reply to topic Reply with quote

Apologies as usual i had posted and editted later:

Quote:

This is also the reason why back in 2000 it was Donninger who got the FPGA job. In this case it was chessbase who first asked a range of other programmers, which would've been better choices in the first place to carry it out; yet if we look back, Donninger was a bad choice.

a) he doesn't know much from SMP coding and never got it going well there
b) it's losing bigtime in efficiency
c) a stand alone card back then could not beat other programs 1 card against 1 program.


d) The biggest advantage of the fpga hardware wasn't used and that is that you can kind of lossless build a very huge evaluation function for it; Chrilly in fact had kind of built worlds smallest evaluation function kind of for Hydra

e) computerchess requires good testing and Chrilly wasn't capable of testing well

You cannot blame Chrilly for A nor E. E is a money question and if a sponsor at start of a project is doing promises then so be it when that doesn't happen - i don't know all details about E.

A would get solved by a specific university, let's not quote names here. But this was someone who is not even remotely capable of writing a good parallel search.

Hydra's parallel search is really bad and doesn't scale as it requires O ( n ^ 2 ) communication.

So when i discussed with the Sheikh on building a 1024 node machine for Hydra i had to advice against that of course - besides as i told him - Chrilly had worked in Petten doing nuclear calculations there at a computer - fpga cards and back then that would've been one of worlds strongest supercomputers - that's asking for trouble as i explained to him - for example Israel might have gotten get very upset about the possibilities back then with something like that of a nuclear engineer who knows how to do those calculations writing software for a 1024 node supercomputer with each node 1 or more fpga cards.

The initial goal however would not be able to achieve with Hydra - namely run at 1024 processors as the algorithm didn't scale. They basically scaled by stopping doing transpositions last few 6 plies or so.

Try such experiment with your chessprogram and run it at a 16 cores or more.

You will see that you start losing factor 10 instantly.

Now add in some latency to the hashtable - bigger machines are a lot slower in latency to get a hashtable entry.

That factor 10 is about what hydra lost in efficiency to the software search as with around a 90k-100k searches a second a card, it did get a huge nps of course.

As usual they had an algorithm that scaled well for the small amount of nodes they could test at (8 or so), but did do so by burning too much of an overhead and wrote something that in generic doesn't scale at all.

All speedup comparisions of Hydra i also want to void, as basically they compared 1 hydra processor not storing last 6 plies into hashtable with n.

So they got the full nps, just like deep blue, but not the search depth they could've gotten.

Obviously if you just use 1 you can easily store all software plies in hashtable. So the correct compare would have been 1 cpu doing hashtable at all plydepths, as 1 cpu could do that easily, versus n cpu's not doing it last n plies.

First losing a few plies of search depth in order to CLAIM a good speedup i find very bad science - i really use a huge understatement here.

Quote:

In short in software Brutus/Hydra would've played a lot stronger.

Please note that Chrilly has some very good excuses why some things didn't go as they should've gone. But i'm not sure i can post that here.

I bet Julien wil remove the posting right away then.

As it has to do with specific companies simply not paying at all what was appointed and specific universities which have no clue about how to parallellize a chessprogram (it could handle less nps at 16 cpu's in paderborn university all cpu's together than a single fpga card of Chrilly delivered; one fpga card got against a 100k nps, versus the 16 cpu's together could do 16k searches a second thanks to the slowdown of their parallel software framework).


Of course with 100k nps i mean : 100k searches per second carried out by the hardware (each machine had 1 fpga card).

The overall conclusion also there is that the focus was SMP programming and building a big evaluation in fpga - that would've really profitted from the fpga.

Both those advantages have not been performed by the Hydra team.

gpgpu programming also has a lot of technical difficulties - but those tesla cards offer possibilities you can exploit which you cannot exploit at your cpu easily.

Using the advantages of the Tesla takes professionals and those really can do well with those cards.

See it like this - it only requires 1 professional to write something that works real well and then everyone can profit from that - in theory.

The big problem in gpgpu programming is the parallel search; it's easy to prove that the best way to solve this requires at least a 3 point solution:

a) SMP search between the gpu's using the DDR3 RAM of the cpu's
b) SMP search between the compute units ( = SIMD - that's around a 32 cores)
c) SMP search within 1 compute unit

For comparision normal SMP searches in software have just 1 layer of SMP search which already isn't easy to build. So there is 3 hurdles here. there is an efficiency loss at each layer. Minimization of that will determine how well its speedup is over 1 cpu core that's doing the same yet using a very efficient shared hashtable (namely everywhere).

The software search of Diep at a shared memory machine, just the addition 2010 is 40-50 pages of a4 full of proof.

Designing such search you need to do on paper not surprisingly or it won't work.

I get the impression that the big paper work that's required for this gets underestimated and laughed away by those who simply have no idea what you need to do to get the maximum out of the hardware.

That requires a paper design that you PROOF.

How to prove software programs on paper to be correct is a course given at some of the better universities. You can order books showing you how to do that. Usually it is the Einstein level guys who are good in this.



Vincent
Back to top
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger
Display posts from previous:   
Subject Author Date/Time
nvidia tesla Maurizio Maglio Wed Apr 04, 2012 3:58 pm
      Re: nvidia tesla Srdja Matovic Wed Apr 04, 2012 6:40 pm
      Re: nvidia tesla Vincent Diepeveen Thu Apr 05, 2012 12:12 pm
            Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 1:03 pm
                  Re: nvidia tesla Vincent Diepeveen Thu Apr 05, 2012 2:47 pm
                        Re: nvidia tesla Vincent Diepeveen Thu Apr 05, 2012 3:39 pm
                              Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 4:57 pm
                        Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 3:57 pm
                              Re: nvidia tesla Vincent Diepeveen Thu Apr 05, 2012 4:06 pm
                                    Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 4:28 pm
                                          Re: nvidia tesla Vincent Diepeveen Thu Apr 05, 2012 5:14 pm
                                                Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 6:08 pm
      Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 1:10 pm
            Re: nvidia tesla Vincent Diepeveen Thu Apr 05, 2012 3:15 pm
                  Re: nvidia tesla Daniel Shawul Thu Apr 05, 2012 4:10 pm
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads