Adhoc Supercomputer in a Day

CRoberson · Post by **CRoberson** » Sun Aug 05, 2012 1:35 am

Adhoc Supercomputer in a Day

This chronicles my experiences trying to create an adhoc supercomputer of borrowed machines in a day.
Why an adhoc Supercomputer? For those of us that enter computer chess events on a regular basis
(multiple times per year), it can be expensive keeping hardware at competitive performance
levels. Every year somebody has bought newer, cheaper and faster hardware. Every few years, the newer technology
allows for more CPUs in a machine. So, it can be expensive keeping
competitive hardware. My idea to resolve that issue is creating an adhoc temporary supercomputer from
borrowed personal computers. If enough machines of reasonable performance can be utilized one
might be able to remain competitive every year without a major yearly expense.

The 2012 CCT took place on the weekend of February 25. I borrowed several machines
on Thursday night and Friday. The goal was to form a cluster (26 procs: 4 quads,
1 oct and a dual) with them and run some three year old code I wrote and embedded
in my chess program to run on the cluster.

1) Previous test experience.
I tried running two quads and a dual in a 110 square foot room. It was during the fall
in NC USA and the computers kept rebooting due to heat. With the windows open and three
large fans running, the computers rebooted about every 5 minutes.

2) The plan.
I plan on solving the heat problem by putting the cluster together in my garage in February. If it gets
too hot, I will open the garage door. The expected outdoor temperature is 40 to 59 degrees Farenheit. All
machines will be configured and tested in my kitchen and then migrated to the garage.
I'll use an inexpensive Gigabit switch to connect the computers with cat 6 cables. Only the primary
computer needs access to the internet. If the system is in the garage, I'll need to use some sort
of Wi-Fi solution to connect to my Wi-Fi network giving the primary machine internet access. Where
possible an extra hard-drive will be installed on the borrowed computers to install Linux. For the
other machines, I'll use an Ubuntu live CD to run Linux without mounting the owner's hard-drive.
This will consume some RAM for a RAM disk, but that should be fine assuming enough memory in the
computer.

3) The network hardware.
An inexpensive Gigabit switch was used as a network backbone connected to each computer with Cat 6
cables. This worked reasonably well, but there was an issue with lack of port buffering on the switch.
More on this in the programming architecture section. To solve the internet connectivity, I chose a
$50 Wi-Fi system
that had switchable functionality. It has a 3 position switch that allows operation as an access point,
repeater or Wi-Fi router. I set it to repeater mode and connected it to the Gigabit switch via a Cat 6
cable. This had a major benefit over using a USB Wi-Fi Adapter: I didn't need to install any
device drivers on any computers and it gave all computers internet access allowing me to switch machines
for the primary at will. One machine that was being considered
as a primary machine had two Gigabit ethernet ports and the rest had only 1 port.

4) Getting Linux on all the machines.
a) The machines with spare hard-drives.
This took much time. There were issues with CD reader sensitivity. One computer ran the Ubuntu 10.04
Live CD fine, but it had failures on installing. After burning multiple CD's and observing the failure
at the same point during installation, I switched CD burners and achieved a successful install. The second
machine would not run Ubuntu 10.04 nor would the third machine. My two machines had Linux installed for
the last several years. I created a Live/Install CD of Ubuntu 11.10 and the Live CD ran on the second and
third loaners. I was successful in getting the second loaner running Linux by 3:00 AM Saturday morning.
Much of the day had been consumed by purchasing parts and diagnosing an issue with a CD burner.

b) The machines using a Linux Live CD.
On Sunday morning (around 3 AM), I tried setting up a node with an Ubuntu Live CD. This went rather
well at first. The ssh server installed (see next section) as before, network connectivity
worked out easily and a simple single processor benchmark of my chess program ran cleanly. However,
I ran in to a problem. In order to run an MPI program across several computers, you need to use the
same user account on all machines and ssh needs this as well. So, I proceeded to add a user to the newly
booted system. This didn't work due to Ubuntu's Live CD only allows two user accounts and both are already
consumed. The two machines that were to run this way were the weakest machines, so I dropped this idea at
4 AM Sunday morning. This left me with 4 nodes: 2 quads and an oct.

5) The machines can talk to each other.
a) ssh
With 4 machines running Linux on Saturday morning when I awoke (7 AM after 3 hours sleep), only two of
them would work with each other without requiring me to type a password. My computers had an older version
of Ubuntu which allowed the use of rsh (an easier method to deal with). The newer versions of Ubuntu
completely drop rsh and force ssh on you. For most uses this may be the best decision, but not necessarily
in this case. After trying several combinations of ssh configuration files, it was 8:30 AM giving me
30 minutes before the first round. So, I ran a single processor version on one of the borrowed computers
which happened to run Telepath with a 10% speed up over my fastest machine. Also, it had twice the memory
of my best machine.

b) sshd
Once the first round was well underway, I asked the group about ssh expertise. Thanks to Dr. Bob Hyatt,
Jon Dart and some others, the root problem was found: the new versions of Ubuntu don't install sshd - the
ssh server. I installed it on the machines and received examples of config files from the online
group. Eventually, I found a workable combination of ssh configuration parameters that allowed either of
the two borrowed machines to
access each other without a login prompt. With more effort, I was able to get my fastest machine to access
the others that way. However, my second machine was never able to communicate with the others. This left me
with 3 operational machines: 2 quads and an oct.

6) Open MPI versions.
My machines had Open MPI version 1.3.3. This version wasn't available any longer on their web site. The
Ubuntu package manager had a fairly new 1.4.4 version. I installed it straight
from Ubuntu on the new machines. Each machine was tested individually with all processors successfully.
This was very good: I didn't need to recompile the program for the new versions of Open MPI. Then
I ran a full system (16 processors) test which failed. The older version of Open MPI didn't work with the newer
versions. Installing the newer version on my machine failed. Given that this was Sunday morning at 5:30 AM, I
decided that two machines with a total of 12 processors might be sufficient. This worked! However, there was
an issue which happened at 8 PM Saturday night. I decided to get some sleep and resolve it in the morning.

7) The Programming Architecture.
I used Open MPI and coded an implementation of YBW concept. This was my own design and implementation which I
coded and tested 3 years earlier.
I awoke Sunday morning at 3 AM. After the failures of running machines in Live CD mode, I decide to address
the 2 machine/12 processor mini-cluster problems from the night before. Years ago, I noticed that the
distributed version took 3 times as many nodes to do searches as the single CPU implementation. I thought this
was due to loss of shared transposition tables. Some of it was. Now, that I have more CPU's to test with I
found that wasn't all of it. An 8 CPU test on the 8 processor system yielded a 10x node count explosion and
using an oct and a quad together for 12 processors yielded a 15x node count explosion. Clearly, this isn't due
to just lack of shared transposition tables. After a little thought, I realized the issue was in how I handled
split points for type 2 and type 3 nodes. I thought up a quick fix for this which was quite simple.

8) Performance
Before the first round of Sunday morning, the 12 processor distributed system performed benchmarks around
30% faster than the single processor version. Of course, this is less than hoped for. However, each processor
of the 8 processor system had been benchmarked at 75% the nodes per second of my best system.

Conclusion
In my academic and professional life, I have had dry runs in advance of events for any systems that
needed to work on a given day. For situations like this, it is difficult to borrow all machines for more
than a day or two. Getting them for two consecutive weekends would be excellent, but impractical. On the
other hand, all of those loaning machines agreed to do it for the next event and give me an extra
day. Given the successful configuration of the 12 processor/two machine system, I am optimistic that the
next effort will be met with more success. Also, I never made it to the garage and my wife was a very good
sport about all the equipment in the kitchen.

I'll not try this for the ACCA World Computer Rapid Chess Championships due to it being in July with expectedly
high temperatures. Maybe, I'll try for the Pan American event in October/November. Certainly the next CCT will
be a prime time to try again.

jshriver · Post by **jshriver** » Mon Aug 06, 2012 6:23 pm

Interesting read.

Have you considered Amazon VM's? You can get compute nodes and network them together. I've been considering this myself.

bob · Post by **bob** » Mon Aug 06, 2012 8:58 pm

jshriver wrote:Interesting read.

Have you considered Amazon VM's? You can get compute nodes and network them together. I've been considering this myself.

You REALLY need dedicated hardware with consistent performance. Not a "cloud machine" whose speed can vary wildly from minute to minute...

jshriver · Post by **jshriver** » Mon Aug 06, 2012 9:01 pm

Agree that was a big reason I ditched the idea. Even though they sell a VM with 1-x CPU's you have no idea if you're really getting 100% of that CPU or if it's being shared with a dozen other VM's, or where they are located. Could fire up 4 different VM's and each one be on a different continent.

Then again I've never really tried, and see they will sell CPU time for CPU intensive crunching (on the fly video render farm, for example). So they might do things differently with that bundle of VM's.

bob · Post by **bob** » Mon Aug 06, 2012 9:20 pm

jshriver wrote:Agree that was a big reason I ditched the idea. Even though they sell a VM with 1-x CPU's you have no idea if you're really getting 100% of that CPU or if it's being shared with a dozen other VM's, or where they are located. Could fire up 4 different VM's and each one be on a different continent.

Then again I've never really tried, and see they will sell CPU time for CPU intensive crunching (on the fly video render farm, for example). So they might do things differently with that bundle of VM's.

The Sun Grid Engine software we use will give you dedicated nodes, which is what you want. But I don't know what Amazon is doing, exactly...

rbarreira · Post by **rbarreira** » Mon Aug 06, 2012 9:31 pm

bob wrote:
jshriver wrote:Interesting read.

Have you considered Amazon VM's? You can get compute nodes and network them together. I've been considering this myself.
You REALLY need dedicated hardware with consistent performance. Not a "cloud machine" whose speed can vary wildly from minute to minute...

I believe that the "cluster compute" EC2 instances are dedicated hardware, and not virtualized, as you can read in this page:

http://aws.amazon.com/ec2/instance-types/

When I tried it, the real problem with Amazon EC2 was latency between the machines. Especially since they had a bug which prevented you from choosing closely-located machines from the web interface (I didn't try with the instance-launching API) . But I've read in the EC2 forums that since then they have fixed this bug, so it may be worth trying now.

diep · Post by **diep** » Tue Aug 07, 2012 3:45 am

rbarreira wrote:
bob wrote:
jshriver wrote:Interesting read.

Have you considered Amazon VM's? You can get compute nodes and network them together. I've been considering this myself.
You REALLY need dedicated hardware with consistent performance. Not a "cloud machine" whose speed can vary wildly from minute to minute...
I believe that the "cluster compute" EC2 instances are dedicated hardware, and not virtualized, as you can read in this page:

http://aws.amazon.com/ec2/instance-types/

When I tried it, the real problem with Amazon EC2 was latency between the machines. Especially since they had a bug which prevented you from choosing closely-located machines from the web interface (I didn't try with the instance-launching API) . But I've read in the EC2 forums that since then they have fixed this bug, so it may be worth trying now.

Amazons claims big victory in selling books electronic now, though i bet less income from it than the hardcopy books, smaller in number by now.

Yet their cloud computing has totally failed. First they served cloud computing from many locations on the planet.

Now just 1 cloud center still has any 'reasonable' type i7 hardware nodes with 8 cores.

The reason it failed so miserable is the high price of course. I can deliver it cheaper from here, and i pay of course more for electricity than major companies do, than Amazons.

I've got 8 core Xeon L5420 machines.

Total cost of them was 2500 euro all together for 8 nodes, including shipment from nearly all components from USA to netherlands. The network i got from mellanox (QDR). I'm using expensive power supplies (golden ones). Only at the bitter end i found a cheaper PSU that's gold and 350 watt.

Each node is 170 watt. So you can do the math. The switch uses around a 100+ watt, when i would arm it with 36 nodes it has a maximum rated power draw of 300 watt though.

So that's 2500 euro for 64 cores. If i burn it at full power for a year, i initially pay a high price for electricity, thanks to tax, especially to support our nations civil servantry, which is 10 times larger than all people together that work in industry (we have 5.5 million (semi_civil servants) and total industry has roughly 600k workers with many getting benched as we speak)., not to mention total failures like solar power and other sources of power that you can prove to not work at all as they aren't 24/24. But i get distracted by the extra tax each time is on the energy thanks to paying for solar panels produced in China (speaking about throwing away cash - the entire EU crisis is over when Spain stops sponsoring solar power as people can EARN on solar power there and do so massively - garantueed payment until the year 2025 or so - billions a year wasted from the taxpayers).

The price before the taxes get on top of it is 4.4 euro cents a kilowatt hour here. Realize that's only if you use more a year than 10k kilowatthour. It's a lot cheaper if you're a plastic factory eating 7 megawatt of course.

Add taxes and you're at the insane rate of 10 cents a kilowatt hour here in Netherlands - realize we burn gas for that.

Now that's still pretty effective for the computers.

Say i run this 24/24. Add some more power for the extra fans to outside for ventilation and some control stuff i want to add.

I don't fully run the cluster yet - only today figured out the RAM is supposed to work until 85C, so i can risk it getting 60C (the cooling of the RAM is incredibly the biggest problem in a self built case from MDF wood @ 40 euro in total costs for entire case which is a 100 kilo or so just in wood weight - total overdesigned).

So if we calculate that price from production costs here and realize this is just a simple household with a small office i sit in with the cluster - that's 8760 hours * 11 cents (i factored in higher salestax soon) * n kilowatt =

963.6 * n euro

8 NODES * 170 WATT = 1360 watt. So say 1500 watt adding switch and some fans. No need for airco. It's like 30C here only 2 days a year and next 30C day is probably june or july 2013.

1.5 * 963.6 euro = 1500 euro. I like rounded numbers.

If i run it 3 years, the economic lifetime usually i pay 4500 euro electricity costs. Add 2500 euro for the nodes. That's 7000 euro.

the network i don't need to add, for EC2 nodes of 8 cores you don't have an infiniband qdr network either.

3 years = 8760 hours * 3 years = 26280 hours.

Expected downtime, maybe 2 days in total, that's internet not electricity. Electricity disturbances here hardly ever happen. Last time was a year or 10 ago, it was monday morning and i still remember it. It was for an hour then or so.

7000 euro / 26280 = 26 cents an hour i need to make for 8 nodes @ 64 cores.

Or a node this is 26 / 8 = 3.3 cents an hour.

Now i need of course compensate for downtime. HPC centers count at it they can run 70% efficient. So that from all hours which is 100%, or 26280 hours above, they WILL effectively number crunch 70% a year from that.

That's government way of calculating. Note the first year they count at 30%.

So in 3 years time that's 30% + 70% + 70% = 170% out of 300%

3.3 cents an hour a node = 170 / 300

x = 300 * 3.3 / 170 = 5.8 cent an hour. Say 6 cent an hour rounded up to above a node an hour.

That's what HPC commercially would sell it for if they'd buy it in for this price.

Anything on top of that is profit therefore if we really want to let it run commercially.

What's Amazons asking for a similar thing? Their initial price was 1.8 euro an HOUR a node of 8 cores.

Initially also there was the sneaky condition that each core is based upon a 1Ghz 'core2 unit'.

So you'd get effectively a quadcore for that @ 2.0Ghz with 8 GB ram, as their initial EC2 unit was of course based upon a 1Ghz unit eating near to shitzero RAM, useless to chess.

You find it weird that Amazons only has a few stupid civil servants as a client for this?

We skipped 100 other problems such as security, safety, military/company espionage, storage, bandwidth to the crunching station etc.

Amazons can of course sign any paper that they will not betray your thing, but the thing that gets attacked most on this planet are HPC centers where calculation occurs. If YOU want to calculate something, you are looking in the future, so others want to know that ,even if they have to ship a fully armed division of spies (armed with camera's scanners and the newest espioange technology as reported 2 days ago is using wireless routers to determine whether people are in a room - inaccurate to scan the room, but the millivolts of it are enough to detect whether people or other moving objects are in the room).

But the bottomline is. Companies and those who are serious start taking a look when Amazons can deliver it for some sort of reasonable price.

Their price is not 6 cents a node of 8 cores. It's closer to 2.5 euro a node of 8 cores an hour.

If you use things in july you could get it cheaper of course - everyone has a holiday then. But if you calculate in july, you just need to ship 1 email to a phd student at an university in the first place to get it for free.

I did do invaluable work in this 'phd student manner' thanks to the big effort of Renze Steenhuisen, who has done really a lot of effort which gave me valuable insights in parameter tuning. He ran up to a core or 80 at 8 core Xeon machines.

I have those at home now and i'm happy with it.

Let's take a look at a serious test. Takes a week or 4 usually and 4 nodes at least.

4 weeks * 4 nodes * 24 hours * 7 days = 2688 cpunode hour

http://aws.amazon.com/ec2/pricing/

for 8 cores with a few gigabyte of ram you need the $2.4 an hour thing.

Eight Extra Large $2.400 per Hour

Hah that's just linux. Other than the open source engines what runs on linux?

Oh yeah, diep does since around 1994 or so (i had linux 1.0 back then!). Sure, no news there.

$2.4 * 2688 = you really want to calculate that? = 6451 DOLLAR.

Just for a single RUN.

it's more than 3 years of nonstop crunching here including the entire price i paid for the nodes. Just it was eating a lot of my time of course.

Amazon is not realistic you know. Only billionaires can afford this. Now billionaires are rich because they never spend money, so that'll be government guys...

Note:
Hah mistake in calculation. Amazon dicks you once again. The 8 EC2 nodes instances are 'virtual cores'. It's a quad core box with hyperthreading or better it's the EC2 definition.

http://aws.amazon.com/ec2/instance-types/

The thing one needs to order is actually:

High-Memory Quadruple Extra Large Instance

68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
EBS-Optimized Available: 1000 Mbps
API name: m2.4xlarge

That's a quad core @ 3.25Ghz, i assume it's an i7.

It's a tad slower than the 8 core Xeons i have here which are 2.5Ghz clocked.

Price of the high memory quadrulpe extra large not shown.
It's a total mess there that price table.

You simply want to know WHAT do you get if you pay X.

An EC2 node is defined as 1Ghz for 1 hour. Could be AMD could be intel core2.

Price per NODE is not clearly defined as they say 'quadruple extr alarge'. If that's 4 virtual ec2 nodes that's 4Ghz or simply a dual core chippie of 2.0Ghz.

Or more likely a 3.2Ghz core2 single core running 4 threads...

It's a mess. You keep paying simply.

No sane company *ever* can agree with what's quoted there. It's not even interesting to ASK.

jdart · Post by **jdart** » Thu Aug 16, 2012 4:46 pm

Your two choices - home cluster and Amazon - are not really comparable.

Your machines are Xeon L5420 - Amazon cluster compute instances are Xeon E5-2670 - big difference.

Also if you are using it 24x7 it is not $2.40 an hour - they have other pricing options if you want continuous usage (see http://aws.amazon.com/ec2/purchasing-options/ - but it's still not cheap). Long term you are better off buying your own box but you are talking $5K a node or so for something that is similar to what Amazon is providing.

With Amazon or another hosting provider, you also are paying for having the machines in a reliable, secure data center with support, vs. your own setup with no support except yourself. Maybe you don't need that but it is part of the cost.

--Jon

diep · Post by **diep** » Thu Aug 16, 2012 5:06 pm

jdart wrote:Your two choices - home cluster and Amazon - are not really comparable.

Your machines are Xeon L5420 - Amazon cluster compute instances are Xeon E5-2670 - big difference.

Also if you are using it 24x7 it is not $2.40 an hour - they have other pricing options if you want continuous usage (see http://aws.amazon.com/ec2/purchasing-options/ - but it's still not cheap). Long term you are better off buying your own box but you are talking $5K a node or so for something that is similar to what Amazon is providing.

With Amazon or another hosting provider, you also are paying for having the machines in a reliable, secure data center with support, vs. your own setup with no support except yourself. Maybe you don't need that but it is part of the cost.

--Jon

A single core on the L5420 is FASTER than a single thread on the e5-2670.

That's a *lot* faster than e5-2670.

The e5-2670 is 2.6Ghz. It's effectively ipc 1.72 with 2 threads a core for Diep so 1 core is exactly IPC=0.86 a logical core.

So the L5420 is a LOT faster than the e5-2670 a core is.

If you want more cores and more RAM than amazons can offer you in a single node:

$2700 for 48 cores @ 128GB ram on ebay.

http://www.ebay.com/itm/Super-Micro-48- ... 3f14d1da86

That's a lot faster than a 2 socket e5-2670 and if you hire that e5-2670 you don't pay $2700 a year. You pay a LOT more.

Of course Diep is one of few chessprogram that works on this hardware very well, same problem with the e5-2670 there, but that's not the discussoin here.

"Spot Instances enable you to bid for unused Amazon EC2 capacity. Instances are charged the Spot Price, which is set by Amazon EC2 "

An EC2 unit is based upon a bit older core 1Ghz core roughly. So hiring the e5-2670 with 8 cores they probably charge a tad more. Each logical core they probably will call an EC2 there for each 1Ghz.

So we speak of :

http://aws.amazon.com/ec2/spot-instances/#6

High CPU spot instance: $0.07 an hour

$0.07 * 32 * 2.6Ghz = 5.824 dollar a node an hour

I don't think hiring that e5-2670 is going to be cheap then if you hire it for a year.

If you reserve it for a year: $5.8 * 8k+ hours ==> 50k+ dollar a YEAR.

Nps comparision of the e5-2670 node @ 2 sockets versus the 4 socket AMD 48 core box:

An i7 core is about 11.5% faster than the AMD cores

AMD box faster: 48 * 2.5Ghz / (16 * 2.6 * 1.115) = 2.6 times

So yes, Amazons is not comparable to people who want performance. It's simply a lot of cash. I guess THAT is why they have basically shut down calculations at other sites. The only 'new i7 nodes' left are at 1 central location in USA now.

So i guess the whole cloud computing @ amazons is not a big succes. I do not wonder why.

They ask too much.

p.s. commercially seen asking $50k is realistic for those e5's. You want to make a profit as a company - yet for HPC crunchers who need big computing power doing it at commercial rates of companies is VERY PRICEY.

This is the big suicide of doing cloud computing @ amazons. Great buzzword, too high price - especially if you realize that all the HPC crunchers usually can do crunching not at very high prices.

Building your own cluster is not easy - but it's worth doing it if you want to crunch.

jdart · Post by **jdart** » Thu Aug 16, 2012 7:21 pm

I am not disputing you can get less $ per compute unit by rolling your own at home.

But you can't really expect Amazon to provide the same or better pricing than you'd get buying a used server off eBay and running it at home. They don't buy their servers on eBay and they don't run them in a garage.

That said, there are dedicated hosting providers that are cheaper than Amazon for high-end hardware used long-term.

--Jon

Adhoc Supercomputer in a Day

Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day

Re: Adhoc Supercomputer in a Day