Get your own supercomputer for under $10,000 USD.

lmader · Post by **lmader** » Tue Feb 03, 2009 5:32 pm

From nVidia's website: Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores. Program in C for Windows or Linux. Available from resellers worldwide for under $9,995.

Matthias Gemuh · Post by **Matthias Gemuh** » Tue Feb 03, 2009 5:56 pm

lmader wrote:From nVidia's website: Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores. Program in C for Windows or Linux. Available from resellers worldwide for under $9,995.

Today is not April 1st.
What am I missing ?

Matthias.

M ANSARI · Post by **M ANSARI** » Tue Feb 03, 2009 6:04 pm

I think this is based on CPU's for graphics and thus these are not so good for integer calculations. I don't know if any engine is written to take advantage of such hardware. It would be great in using a system to generate a Monte Carlo type analysis of certain chess positions.

mhull · Post by **mhull** » Tue Feb 03, 2009 6:05 pm

lmader wrote:From nVidia's website: Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores. Program in C for Windows or Linux. Available from resellers worldwide for under $9,995.

A bit overpriced for the equivalent of a couple of GPUs. That's basically what you're getting.

mhull · Post by **mhull** » Tue Feb 03, 2009 6:07 pm

M ANSARI wrote:I think this is based on CPU's for graphics and thus these are not so good for integer calculations. I don't know if any engine is written to take advantage of such hardware. It would be great in using a system to generate a Monte Carlo type analysis of certain chess positions.

Yes, it's GPU based. Perfect for protein folding. But you can build your own much cheaper with ganged-up ATI cards and a big power supply.

lmader · Post by **lmader** » Tue Feb 03, 2009 6:39 pm

I don't think it's quite that simple. I think that there must be some sort of software layer that the application is written to, or perhaps a compiler that builds an app with the proper instruction set, to take advantage of the GPU's. nVidia is providing this with their CUDA tools, I assume.

towforce · Post by **towforce** » Tue Feb 03, 2009 7:54 pm

lmader wrote:From nVidia's website: Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores. Program in C for Windows or Linux. Available from resellers worldwide for under $9,995.

Independent benchmarks?

diep · Post by **diep** » Thu Feb 05, 2009 2:36 am

lmader wrote:From nVidia's website: Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores. Program in C for Windows or Linux. Available from resellers worldwide for under $9,995.

What software are you going to run on it?

Games won't run on it. Nor any existing software you have.

In hardware forum some intel fanboys are claiming it's actually 4 nodes @ 8 vector cores in their eyes. So 32 cores in total. It is a valid viewpoint.

Note as a softwaredeveloper you can get it cheaper usually than the commercial price.

diep · Post by **diep** » Thu Feb 05, 2009 2:51 am

lmader wrote:I don't think it's quite that simple. I think that there must be some sort of software layer that the application is written to, or perhaps a compiler that builds an app with the proper instruction set, to take advantage of the GPU's. nVidia is providing this with their CUDA tools, I assume.

A few months ago i shipped a few emails to Nvidia regarding this hardware, to port some source codes to it (not chess of course). Such a project would be a pilot project of course. Getting allowance for a pilot project is already very tough. In those emails i asked Nvidia for technical information on the hardware. As this hardware is so special, and definitely interesting for the future if you see its potential. However to optimize for hardware that is so special you need to know exact limitations and possibilities. A very important limitation is not having instruction set with throughput and latencies that the CUDA compiles to. From that you can easily calculate how fast specific software CAN potentially run and what algorithms or parallellization level/grade you need to get things done. Still of course there can be a lot of other obstacles.

Nvidia indicated to not give support upon hardware at all, let alone reveal this information. Realize not a single other manufacturer AFAIK is doing difficult revealing this.

Probably only if you are gonna buy a 1000 of those cards, you can get SOME information from nvidia.

That's very very few organisations and suddenly not ones that publicly brag about using nvidia cards.

I hope you realize the downside of that. Nvidia can still claim any nonsense regarding their cards and that it is a supercomputer of some kind. All ballony.

More powerful on paper and a lot cheaper than 4 tesla cards, is of course AMD with their 4870x2 cards.

Get a skulltrail mainboard, put in 4 carsd 4870x2 and you have something same speed like those 4 tesla cards, yet those four AMD cards will cost you under 2000. That's a total of 1600 * 4 = 6400 execution units. Lot more than 960 nvidia stream processors isn't it?

Realize that you can do any paper claim for videocards, there is no serious software to demonstrate they're dead wrong with their claims. Both nvidia and amd/ati regarding the gpu's.

Do you really want 32 bits hardware that can multiply in positive integers only 16 x 16 bits == 32 bits?

Even for my limited bank account i need more than that (namely i need a minus sign additional)

Vincent

diep · Post by **diep** » Thu Feb 05, 2009 3:29 am

Matthias Gemuh wrote:
lmader wrote:From nVidia's website: Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer is based on the revolutionary NVIDIA® CUDA™ parallel computing architecture and powered by up to 960 parallel processing cores. Program in C for Windows or Linux. Available from resellers worldwide for under $9,995.

Today is not April 1st.
What am I missing ?

Matthias.

Well see my previous posting. Quite possible Nvidia is gonna go bankrupt because of that in long term. They obstruct developers.

Suppose you're a sergeant that's resupplying the electronic soldier of the future. Soldier comes to you. "give me a new gun, the old one gets old".
Sergeant: "i don't have a new gun for you"

He puts on the desk the nvidia Tesla supercomputer.

"I have something MUCH BETTER. It has 960 stream guns"

Soldier: "what can a stream gun do for me?"

Sergeant: "stream guns can execute CUDA rounds".

Soldier: "how big are those rounds?"

Sergeant: "look in the manual"

Soldier: "which manual, can you ship me the tech documents?"

Sergeant: "we do not deliver support for tech documents, the
propaganda on the net that it has 960 stream guns with deadly impact
is more than enough for you"

Soldier: "how far can it shoot"

Sergeant: "look in the manual"

Soldier: "if i put the Tesla in the dark together, which instructions can i execute?"

Sergeant: "we do not deliver support on it"

That's basically the answers i got from Nvidia. They're dead in gpgpu if they continue like that.

As for computerchess, the only problem i see is the latency to RAM and the fact that with this 4 node box, existing out of 4-8 gpu's in total (each card has 1-2), basically you have 4-8 independant memory ranges.

Also known as non-shared memory segments.
So that's a BIG hurdle to take to make a chess proggie run on it.
Others here will see a lot more hurdles probably.

Another hurdle is a potentially very low IPC.

What is IPC? Well that's effective number of instructions pro cycle executed.

As i just said you can just use a 120 cores or so effectively.

If ipc is like 0.3 there, versus 2.0 for nehalem, then Nehalem already starts at advantage: 2.0 / 0.3 = 6.66

that is a realistic compare.

So then it is for a simple quadcore nehalem:
4 cores * 3.2Ghz * 2.0 instructions executed = 25.6 G instructions

Now tesla: 120 cores * 1.2Ghz * 0.3 = 43.2 G instructions

You'll realize the "theoretical" peak it has.

Now comes the catch. We factor in first the extra OVERHEAD.

Overhead comes from 2 sides:
a) parallel speedup
b) nvidia has simpler processor so it needs a lot more instructions
to get the same thing done you do in C easily at a x64 processor

25% parallel speedup at 120 cores is what most would get i suppose, if they're real good (IQ180 at least).

nvidia i estimate you lose at least factor 4 to extra overhead.

So that's another factor 16 difference in total.

Get that extra factor overhead tinier and it's already attractive to take a look at it, if you had those tech documents, which would reveal more hurdles to take, and o boy, they sure are there.

Note i also skipped other problems, such as parallellism between thread blocks, which means extra code. Parallellism within 1 thread block.

In a quadcore chip you just have 1 parallel search going, in such a gpu you have already a 2 layer parallellism. That loses a lot more than the 25% i quoted.

Maybe 5% efficiency?

It is on paper not impossible to write a chessprogram for a GPU, and it sure would be a challenge. It is fulltime work though and who pays for your time?

Maybe the biggest hurdle is that you already need to have a good parallel search written for a multicore chip, to get experience. If you have already done that, why make another one for a GPU?

Vincent

p.s. the biggest problem nvidia has, is: "what if i put simply 16 nodes in my 16 node cluster switch?"

Then i've got very easily 16 quadcores and 64 cores.

Nvidia as you see here you got just 4 nodes. Nvidia doesn't scale simply very well. If you would buy 16 of these 'supercomputers', it is very complicated to get some sort of shared memory search going.

Well that is, except when you want the efficiency of a deep blue or hydra - no thanks.

Get your own supercomputer for under $10,000 USD.

Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.

Re: Get your own supercomputer for under $10,000 USD.