Don wrote:diep wrote:Cheapest 2 machine cluster is probably buy a twin node from ebay.
That's 4 sockets in total. They're there for $200 right now. Needs CPU's.
$19 * 4 for L5420 right now on ebay.
Already comes with heatsinks and psu.
and you probably want 2 x 8 GB ram.
As it's just 2 nodes you just need an infiniband cable to connect 2 nodes.
Only above 2 nodes you need a switch.
So for $300 you have in total 40Ghz worth of core2 power cluster with 4 sockets and 2 machines and in total 16 cores.
That sounds good but I'll bet the performance is not much better than a Sandy Bridge i5. Do you have a sense of performance compared to the commodity i5's you can now get anywhere? I.E. measured by total nodes per second ratio for any chess program?
For reference, komodo 5 give me about 1.3 million nodes per second on my budget desktop box which is an Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz
For reference this is more than my i7-980x gives per core and that machine is running at 3.33 GHz.
The $150 a machine is including 2 cpu's L5420. Would you buy L5420's yourself that's $19 on ebay a piece.
the older generation xeon and opteron cpu's usually are for peanuts on ebay if they do not have the latest floating point technology and are clocked not so high.
So a single sandy bridge cpu is more expensive than 4 nodes or so. Diep gets very exact benchmarked by LostCircuits. That guy is really good in benchmarking. I've not seen him make mistakes.
Realize however intel really cares about performance. Hyperthreading always works magnificent. So all i7 testresults are with hyperthreading and the cpu's even at full load they profit bigtime from hyperthreading; if you really cool cpu's well and motherboard and keep everything at around a 15C, then it eats less power and turboboost works better, whereas at home your machine inside and motherboard will be more like 50C and forget turboboost and hyperthreading as well.
In all those benchmarks, hyperthreading increases diep's NPS by at least 25% or so.
You won't get the same hyperthreading performance except when you watercool and overclock it.
Never assume that if you build the same system that without overclocking you can get the same performance like these guys - they really know how to test.
The AMD boost technology doesn't work on the other hand on all those benchmark sites.
Much of the performance in those tests therefore the RAM speed is critical, meanwhile if you have a cluster you already lose bigtime there anyway to the network, so it's not so critical. What you can't prefetch there simply is overhead you lose. An i5 doesn't offer at a cluster for chess the same benefit like it offers in a real well carried out benchmark.
In testresults from Lostcircuits you can see next:
http://www.lostcircuits.com/mambo//inde ... itstart=13
Test gets carried out everywhere with the same Diep version that SMP wise scales really well.
The fastest core5 in his test is the core i5 661 at 672k nps
The i5 is a 2 core cpu with 4 threads. And it runs default at 3.33Ghz
Its max turbo frequency is 3.6ghz so you can assume it ran at 3.6Ghz at all 4 cores.
I doubt you will manage that at home without overclocking. These guys have special motherboards to test where they can enforce turboboost always upon as they know they cool well.
Also the RAM used in the tests at lostcircuits is for all manufacturers the fastest RAM.
You will not buy such RAM.
the L5420 doesn't turboboost at all of course it's a real core2.
Now these cpu's don't scale 100% from 1 core to 4 cores. You lose something. You can see this in the test.
Just extrapolate it, that works:
At his results at lostcircuits Michael also mentions the Q6600.
Now you'll argue that's older generation cpu than the L5420.
that's entirely true, yet if you extrapolate the Q6600, realize all those
cpu's are core2's. It's the same executable you know. Core2 is a Core2.
i7 is just a core2 with on die memory controller from performance viewpoint.
Note this benchmark executable is using a few SIMD instructions but it doesn't matter much (under 1% or so).
718k nps for Q6600 is at 2.4ghz. Do that times 2 and you got my Xeon machines which are 2.5ghz times 8 cores. Somewhere around 1.4+ M nps it is.
If you scrolll up then to BEAT that nps you need a sixcore intel.
Without hyperthreading in fact a sixcore intel is on par in performance to the L5420 machine.
The newer sandy bridge cpu's have 4 memory channels.
It's not clear to me how much hyperthreading gives at those machines.
My impression is that the improved performance for Diep is improved hyperthreading performance.
Does Komodo work with hyperthreading?
If not then reduce 30% from those NPS-es you see there of the i7's.
Can you run with hyperthreading at an i5 in fact?
Now the new line of intels with 4 cores and 8 logical cores, they WILL be faster when you overclock them than the L5420 machines i got here. How much do you profit from that hyperthreading on them however as without that hyperthreading they lose 25%.
For a single node sixcore intel, i can buy 4 machines.
Those L5420 have REAL cores. That's 4 REAL cores, that of course annihilates any i5 which are 2 core cpu's.
the only cheap cpu that really is on par is the i7-2600k and newer incarnations of it, however you will need to overclock to get same performance like a real 8 core cpu.
Power consumption is similar like a L5420 machine, however you CAN over course watercool the i7-2600k very well with a relative cheap kit.
Maybe run it stable at 4Ghz in fact.
It is faster than than a L5420 * 2 cpu's machine, yet the grand total cost of such i7-2600k machine in terms of PRICE is not so interesting.
Did you consider that for a cluster too fast nodes is not interesting to have?
Many reasonable nodes is better than a few fast ones of course.
The more cores you got, the tougher it is to communicate through that single network card.
The problem of all the i7 machines is the same - the cpu cost is too high and on ebay the high clocked cpu's always keep too expensive compared to cpu's that are a lot lower clocked yet where you can put 2 of them from in a single motherboard.
My original plan to build a cluster was get some i7's relative cheap and then watercool them.
Doing the money math the real problem is the cpu costs of the i7's.
Forget anything underneath that. Realize also the i5's have far less memory channels than the i7's and they always burn up quicker if you would want to overclock.
Overclocking at a cluster - i don't need to talk about that to someone who was administrator of some big supercomputers.