Strongest MPI-capable (cluster) engine?
Moderators: hgm, Dann Corbit, Harvey Williamson
-
Daniel Shawul
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Strongest MPI-capable (cluster) engine?
I don't know why that happens but I just compiled the latest version and it seems to work. Go to the Makefile and change COMPILER=cluster. I do not expect a lot of speed up specially from cheap RasPI clusters. If you don't have it by now here is cluster toga https://dl.dropbox.com/u/55295461/toga.zip
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Strongest MPI-capable (cluster) engine?
C compiler?ZirconiumX wrote:Does Diep run on a Raspberry Pi?
I thought not.
Matthew:out
-
ZirconiumX
- Posts: 1334
- Joined: Sun Jul 17, 2011 11:14 am
Re: Strongest MPI-capable (cluster) engine?
@Mike RS are terrible. Ask for a refund and buy from Farnell. They have a 3 week lead time as opposed to RS's 6 weeks.
@Daniel COMPILER=cluster DEBUG=0 OPTIMISE=1 MAX_HOSTS=128 MAX_CPUS=8
Using MPICH2.
@Vincent GCC 4.6 and GCC 4.7 run natively on Raspbian.
Matthew:out
@Daniel COMPILER=cluster DEBUG=0 OPTIMISE=1 MAX_HOSTS=128 MAX_CPUS=8
Using MPICH2.
@Vincent GCC 4.6 and GCC 4.7 run natively on Raspbian.
Matthew:out
Some believe in the almighty dollar.
I believe in the almighty printf statement.
I believe in the almighty printf statement.
-
mike_bike_kite
- Posts: 98
- Joined: Tue Jul 26, 2011 12:18 am
- Location: London
Re: Strongest MPI-capable (cluster) engine?
I forgot to add the price of an SD card which is required to hold the OS etc (£8) which knocks the price up to (21+8)*8 = £232. You'd also need a moderate amount of cabling and 8 power supplies. I suspect it would be hard to justify on price alone. It would be a good fun project though - I suspect it might be quite challenging as well, trying to get them all working together. Good luck!mike_bike_kite wrote:About £150 for the processor, £100 for the motherboard, £40 for 4GB RAM and a fan £20 = £310 for 8 cores running at nearly 4Ghz with 4GB RAM. 8 Raspberry Pi's will cost approx £160 which is definitely cheaper but then your 8 "cores" will only be running at 0.7Ghz with 256MB RAM.ZirconiumX wrote:You go and order your AMD. And your motherboard. And your heatsink. Etc.
I'm pretty sure the Pi is cheaper.
You're both right about ordering my RasPi from Farnell - I'll do this now.
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Strongest MPI-capable (cluster) engine?
mike_bike_kite wrote:I'm still waiting 3 months after ordering mine. At this rate it would take over a year to build a 4 core cluster. Wouldn't you do better just buying an AMD FX-8150 with 8 cores on chip? if you got really bored then you could try clustering those things together. I suspect it would be way cheaper too. I can't decide what to do with my RasPi (assuming they ever send me one) - it might go into an arcade machine I'm building or I might have a go at a touch screen MP3 jukebox for the garage.ZirconiumX wrote:I will be getting another RasPi soon. Since RasPi's do not have any way of connecting them as if they were a dual-core computer
The most bang for your buck is to put together your own Bewoulf Cluster. In simple terms you basically buy some inexpensive commodity i3, i5 or i7 quads, link them together with fast ethernet and install a version of Linux that treats them as a single supercomputer. Anything else you do will cost more and give you less power - especially for chess.
Keep in mind that power usage is an issue. You can go for the lower power laptop chips and save a lot on electricity, but when you give up CPU power to have more CPU's you have to look very closely at the numbers to see if you are making a reasonable tradeoff. For example if they are each half the speed of a commodity i5 then is running twice as many of them going to save power? Also, the same "total" performance you would rather have less total cores as very few applications including chess, scale well to more processors but they always scale well to faster processors.
I looked into this and you can get almost twice as many AMD processors for the same price, but I calculated that it still would be a close call which is better even for automated testing which scales perfectly. At this moment in time Intel wins over AMD. Maybe that will change but right now it's not the case.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Strongest MPI-capable (cluster) engine?
Most dang for buck still is what i bought: 2 socket L5420 or E53xx nodes.Don wrote:mike_bike_kite wrote:I'm still waiting 3 months after ordering mine. At this rate it would take over a year to build a 4 core cluster. Wouldn't you do better just buying an AMD FX-8150 with 8 cores on chip? if you got really bored then you could try clustering those things together. I suspect it would be way cheaper too. I can't decide what to do with my RasPi (assuming they ever send me one) - it might go into an arcade machine I'm building or I might have a go at a touch screen MP3 jukebox for the garage.ZirconiumX wrote:I will be getting another RasPi soon. Since RasPi's do not have any way of connecting them as if they were a dual-core computer
The most bang for your buck is to put together your own Bewoulf Cluster. In simple terms you basically buy some inexpensive commodity i3, i5 or i7 quads, link them together with fast ethernet and install a version of Linux that treats them as a single supercomputer. Anything else you do will cost more and give you less power - especially for chess.
Keep in mind that power usage is an issue. You can go for the lower power laptop chips and save a lot on electricity, but when you give up CPU power to have more CPU's you have to look very closely at the numbers to see if you are making a reasonable tradeoff. For example if they are each half the speed of a commodity i5 then is running twice as many of them going to save power? Also, the same "total" performance you would rather have less total cores as very few applications including chess, scale well to more processors but they always scale well to faster processors.
I looked into this and you can get almost twice as many AMD processors for the same price, but I calculated that it still would be a close call which is better even for automated testing which scales perfectly. At this moment in time Intel wins over AMD. Maybe that will change but right now it's not the case.
That's 8 cores 2.5Ghz roughly.
They are $150 a machine completely working.
Such nodes eat less power than any modern high clocked i7 or high clocked AMD bulldozer. Under full load it's 170 watt here a node from which 10 watt to drives roughly.
Anything else high clocked is either too slow or is double expensive.
Beating 2 sockets is just nearly impossible. with a single cpu. you only manage that with a high clocked CPU. Realize you can't overclock if you want to do things cheap.
A good watercooling kit is $500. the simple watercooling kits of $110 they're not good enough to really overclock in a stable manner.
In first place it's not wise to overclock clusters. For that overclocking gear you can buy a bunch of new nodes.
The only disadvantage of such clusters is the huge power usage when they all are working. You really need a room where you have enough ventilation.
If you want to build a cluster of under or equal to 8 machines then a cluster is not so efficient right now. At 16 nodes it really wins bigtime from any other solution.
Note that a crucial factor for clusters is getting a low latency to the machines. This requires a low latency network. There is a lot of alternatives there, yet TCP is not gonna do it for you. So ethernet is a bad idea there.
No one on the planet has ever gotten a serious SMP algorithm to work on ethernet that works realtime.
There is some pdf's but it's all total BS to say VERY POLITE.
Realize that 16 machines L5420 is 16 * 20 = 320Ghz in total of core2 power. Fiddling a tad with hashtables you always will be able to cope with the local RAM.
Putting in 8GB ram a machine is relative easy (i didn't even do that, i put in 2GB ram and a few machines have 4).
320Ghz for $150 * 16 = $2400. Note many infiniband switches go until 36 ports.
If you really want to kick butt, in fact the old 4x infiniband already is more than sufficient and cheap on ebay. Not seldom you also can find motherboards with built in infiniband. So all you need to buy then is a switch and cables (roughly $30).
There is faster latency networks than the dirt cheap infiniband, of course newer infiniband is faster than that yet more expensive, usually getting cables is a big problem then as well as scaling to many nodes.
Switches were cheap af ew months ago on ebay, yet right now only 4x infiniband is cheap on ebay. the way faster latency connect-X, especially Mellanox, is difficult to get cheap.
The 4x is roughly 3 microseconds latency to the network to get hashtable entries from remote nodes. That's a lot worse than alternatives that are tougher to install, yet each node is so dirt cheap to build that in this manner building a cluster of 36 machines is real easy to build.
In fact it's easy to connect switches to each other, so scaling to bigger clusters really is dirt cheap.
Just consumes lots of power when you turn it on.
the intel cpu's by the way, don't look at what intel writes in terms of power usage. The E5420 versus L5420 under full load won't be a big difference in power usage unlike the big fairy tales on intels homepage if you just look to TDP. See tests online performed there.
The E54xx series from intel is basically a strong core2 processor, just the RAM is a tad slower latency. If you build software that can work with infiniband obviously a tad slower latency RAM is not a problem either.
So therefore those are really attractive and dirt cheap to build small clusters with (small like 36 nodes or so).
If you intend to build not a cluster then unbeatable currently is a 4 socket AMD machine with 48 real cores. So previous generation.
Those 6180SE cpu's are $400 on ebay roughly. 4 of them is $1600.
Motherboard will be tad more expensive. Say $900.
Then you have 120Ghz in real CPU power and 48 real cores, for just a couple of thousands.
That's 48 x 2.5Ghz.
Then all you need is good software.
Realize when adding up SMP losses that beating a 120Ghz machine of just a few thousands is rather complicated with a cluster.
It's just about software quality now and nothing else.
Here at home i have 8 nodes L5420 with Mellanox QDR network.
I can scale up the cluster pretty easily if i would want to, as switch
has 36 ports and i have plenty network cards.
Yet 8 nodes i considered the minimum to do quite some testing here, intention is to grow to 16 nodes one day.
the machines simply boot over the network. Now this won't be a surprise to those who know about this, yet then suddenly most here will realize how easy things is.
In itself 8 nodes here is 160Ghz of core2 power. That's NOT faster than a 48 core machine of 120Ghz and if you have real high clocked i7 system.
Say 16 real cores @ 4.5Ghz or so, that's 8 * 9 = 72Ghz.
That's all "on par* with each other, yet the cluster is more fun and of course just a part of the price of the 2 socket box.
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Strongest MPI-capable (cluster) engine?
That may be a good setup but this discussion was about clusters, not a high performance single board system.diep wrote:Most dang for buck still is what i bought: 2 socket L5420 or E53xx nodes.Don wrote:mike_bike_kite wrote:I'm still waiting 3 months after ordering mine. At this rate it would take over a year to build a 4 core cluster. Wouldn't you do better just buying an AMD FX-8150 with 8 cores on chip? if you got really bored then you could try clustering those things together. I suspect it would be way cheaper too. I can't decide what to do with my RasPi (assuming they ever send me one) - it might go into an arcade machine I'm building or I might have a go at a touch screen MP3 jukebox for the garage.ZirconiumX wrote:I will be getting another RasPi soon. Since RasPi's do not have any way of connecting them as if they were a dual-core computer
The most bang for your buck is to put together your own Bewoulf Cluster. In simple terms you basically buy some inexpensive commodity i3, i5 or i7 quads, link them together with fast ethernet and install a version of Linux that treats them as a single supercomputer. Anything else you do will cost more and give you less power - especially for chess.
Keep in mind that power usage is an issue. You can go for the lower power laptop chips and save a lot on electricity, but when you give up CPU power to have more CPU's you have to look very closely at the numbers to see if you are making a reasonable tradeoff. For example if they are each half the speed of a commodity i5 then is running twice as many of them going to save power? Also, the same "total" performance you would rather have less total cores as very few applications including chess, scale well to more processors but they always scale well to faster processors.
I looked into this and you can get almost twice as many AMD processors for the same price, but I calculated that it still would be a close call which is better even for automated testing which scales perfectly. At this moment in time Intel wins over AMD. Maybe that will change but right now it's not the case.
That's 8 cores 2.5Ghz roughly.
I was looking into something for my testing - where scalability is not a big factor. I could use a big Bewoulf cluster and Ethernet speed would not be a big issue. If your suggestions are good then those would be good choices also for Bewoulf.
There are various versions of Linux available to provide an operation system for such system. In fact if you have 2 or 3 PC's laying around you can make a Bewoulf from them by installing the OS software. Knoppix has a solution available and there is Mosix (openMosix) I think and others - so the OS need not cost anything. It does load balancing and such for you - basically making your setup look like a single machine.
They are $150 a machine completely working.
Such nodes eat less power than any modern high clocked i7 or high clocked AMD bulldozer. Under full load it's 170 watt here a node from which 10 watt to drives roughly.
Anything else high clocked is either too slow or is double expensive.
Beating 2 sockets is just nearly impossible. with a single cpu. you only manage that with a high clocked CPU. Realize you can't overclock if you want to do things cheap.
A good watercooling kit is $500. the simple watercooling kits of $110 they're not good enough to really overclock in a stable manner.
In first place it's not wise to overclock clusters. For that overclocking gear you can buy a bunch of new nodes.
The only disadvantage of such clusters is the huge power usage when they all are working. You really need a room where you have enough ventilation.
If you want to build a cluster of under or equal to 8 machines then a cluster is not so efficient right now. At 16 nodes it really wins bigtime from any other solution.
Note that a crucial factor for clusters is getting a low latency to the machines. This requires a low latency network. There is a lot of alternatives there, yet TCP is not gonna do it for you. So ethernet is a bad idea there.
No one on the planet has ever gotten a serious SMP algorithm to work on ethernet that works realtime.
There is some pdf's but it's all total BS to say VERY POLITE.
Realize that 16 machines L5420 is 16 * 20 = 320Ghz in total of core2 power. Fiddling a tad with hashtables you always will be able to cope with the local RAM.
Putting in 8GB ram a machine is relative easy (i didn't even do that, i put in 2GB ram and a few machines have 4).
320Ghz for $150 * 16 = $2400.
If you intend to build not a cluster then unbeatable currently is a 4 socket AMD machine with 48 real cores. So previous generation.
Those 6180SE cpu's are $400 on ebay roughly. 4 of them is $1600.
Motherboard will be tad more expensive. Say $900.
Then you have 120Ghz in real CPU power and 48 real cores, for just a couple of thousands.
That's 48 x 2.5Ghz.
Then all you need is good software.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Strongest MPI-capable (cluster) engine?
Don,
My cluster exists out of 2 socket L5420 nodes. They are $150 on ebay fully configured.
That's what i use to cluster.
The 2 socket machines are cheapest to cluster Ghz wise and do no eat more power than single socket consumer cpu's which eat too much power always to get to that 3Ghz.
It gets a few dollars more expensive if you either do this:
a) build it your self in order to not have it noisy - that's a constraint here in my office - that adds costs and gives you a lot of effort
b) buy motherboards with built in infiniband connector
Note B doesn't rule out A.
those built in things are DDR infiniband that's 20 gbit
Here i've got 2 x 40 gbit (40 gbit bidirectional).
The simple 4x infiniband is 10 gbit and dirt cheap, like $60 a card add $30 for a cable and a cheap 4x switch of 36 nodes is like $300 on ebay.
Price of DDR is a lot more expensive (switch price) and price of connect-X or QDR a lot above that.
Most supercomputers right now use QDR infiniband. the new supercomputers use FDR infiniband from Mellanox now.
Qlogics still only is in QDR times though they also intend to release FDR.
Just pick what you need.
My cluster exists out of 2 socket L5420 nodes. They are $150 on ebay fully configured.
That's what i use to cluster.
The 2 socket machines are cheapest to cluster Ghz wise and do no eat more power than single socket consumer cpu's which eat too much power always to get to that 3Ghz.
It gets a few dollars more expensive if you either do this:
a) build it your self in order to not have it noisy - that's a constraint here in my office - that adds costs and gives you a lot of effort
b) buy motherboards with built in infiniband connector
Note B doesn't rule out A.
those built in things are DDR infiniband that's 20 gbit
Here i've got 2 x 40 gbit (40 gbit bidirectional).
The simple 4x infiniband is 10 gbit and dirt cheap, like $60 a card add $30 for a cable and a cheap 4x switch of 36 nodes is like $300 on ebay.
Price of DDR is a lot more expensive (switch price) and price of connect-X or QDR a lot above that.
Most supercomputers right now use QDR infiniband. the new supercomputers use FDR infiniband from Mellanox now.
Qlogics still only is in QDR times though they also intend to release FDR.
Just pick what you need.
-
diep
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
Re: Strongest MPI-capable (cluster) engine?
Cheapest 2 machine cluster is probably buy a twin node from ebay.
That's 4 sockets in total. They're there for $200 right now. Needs CPU's.
$19 * 4 for L5420 right now on ebay.
Already comes with heatsinks and psu.
and you probably want 2 x 8 GB ram.
As it's just 2 nodes you just need an infiniband cable to connect 2 nodes.
Only above 2 nodes you need a switch.
So for $300 you have in total 40Ghz worth of core2 power cluster with 4 sockets and 2 machines and in total 16 cores.
That's 4 sockets in total. They're there for $200 right now. Needs CPU's.
$19 * 4 for L5420 right now on ebay.
Already comes with heatsinks and psu.
and you probably want 2 x 8 GB ram.
As it's just 2 nodes you just need an infiniband cable to connect 2 nodes.
Only above 2 nodes you need a switch.
So for $300 you have in total 40Ghz worth of core2 power cluster with 4 sockets and 2 machines and in total 16 cores.
-
Don
- Posts: 5106
- Joined: Tue Apr 29, 2008 4:27 pm
Re: Strongest MPI-capable (cluster) engine?
What is total system price? I assume the $150 is per chip and then the price of the motherboard and other components, memory, etc.diep wrote:Don,
My cluster exists out of 2 socket L5420 nodes. They are $150 on ebay fully configured.
So let's say I wanted to put together a singe 2 processor box - approximately what would it cost me to have a running system assuming I already had memory, drives and whatever other peripherals I needed?
When I bought my 6 core system (i7-980x) I considered less expensive alternatives such a dual processor system and high end quads but I was surprised at the amount of performance you give up and the price increase. The 2 processor systems I looked at would not perform as well. Again note that my calculation for performance is simple, total nodes per second when all cores are loaded with an SP chess program. I realize that for an MPI capable chess program there are other considerations but that is not my usage pattern.
Of course if I were building a serious cluster (with say over 16 cores) then power usage becomes part of the equation and I would sacrifice a little performance for a big power usage efficiency.
That's what i use to cluster.
The 2 socket machines are cheapest to cluster Ghz wise and do no eat more power than single socket consumer cpu's which eat too much power always to get to that 3Ghz.
It gets a few dollars more expensive if you either do this:
a) build it your self in order to not have it noisy - that's a constraint here in my office - that adds costs and gives you a lot of effort
b) buy motherboards with built in infiniband connector
Note B doesn't rule out A.
those built in things are DDR infiniband that's 20 gbit
Here i've got 2 x 40 gbit (40 gbit bidirectional).
The simple 4x infiniband is 10 gbit and dirt cheap, like $60 a card add $30 for a cable and a cheap 4x switch of 36 nodes is like $300 on ebay.
Price of DDR is a lot more expensive (switch price) and price of connect-X or QDR a lot above that.
Most supercomputers right now use QDR infiniband. the new supercomputers use FDR infiniband from Mellanox now.
Qlogics still only is in QDR times though they also intend to release FDR.
Just pick what you need.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.