Strongest MPI-capable (cluster) engine?

Discussion of chess software programming and technical issues.

Moderator: Ras

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

Don wrote:Hyperthreading is a benefit in our automated testing.
You cannot do time limited testing when using 1 core @ hyperthreading. If 1 proces by accident gets a real core without another proces using the other logical core very well, then i don't know about your IPC, but suddenly 1 proces then gets all system time.

So say you have a system with 4 real cores @ 8 logical cores.

If you run your engine at 1 core, without another engine on it, IPC in case of Diep is around a 1.5 (i use rounded off numbers now to illustrate the example, actually Diep's IPC is far above 1.5 at core2 cpu's and at i7 it's far above 1.7 with 2 logical cores using 1 core).

When 2 processes run at 1 core, so both getting a logical core, their *total* IPC is 1.7. So the IPC for a single proces is 0.85.

If you run matches now just using logical cores, regurarly some cores will get that IPC=1.5 at 1 or more moves and their opponent will be 0.85 at that moment. So that isn't a fair match to say polite.

It's not factor 2 advantage, yet nearly, for most likely some critical moves.
This creates huge noise in your testresults and it happens nonstop.

To filter that out would require a million games or so if not more, this for a single run.

Hyperthreading creates huge non-determinism; that creates huge noise.

Anything that works realtime which requires accurate measurement won't work well with hyperthreading as measuring gets more complicated, because your proces X never knows whether another proces runs AT THIS MOMENT at the other logical core or not. It switches all that quick that you just never know about all this.

Note that the i5 @ 2300 series doesn't have hyperthreading. Neither does the Xeon 5400 series.

That's a huge advantage for speedy testing.

If you run single core matches, just go for the system with most combined Ghz i'd argue.

That's what i did do and i don't regret it.

I'm a bit amazed that L5420 still is best deal on ebay there. The cpu's are 19 dollar a piece and it's 4 x 2.5Ghz and a single machine can have 2 of them and it's a low power CPU so power usage isn't high.

That's 8 * 2.5Ghz = 20Ghz for $38 in cpu costs.

p.s. of course fastest way to test for you would involve quite some programming work; you could port your engine to CUDA or something and then test at thousands of cores at same time. It's a tad of work, but it'll speedup your testing factor 100 or so.
ZirconiumX
Posts: 1361
Joined: Sun Jul 17, 2011 11:14 am
Full name: Hannah Ravensloft

Re: Strongest MPI-capable (cluster) engine?

Post by ZirconiumX »

Vincent. What the hell are you on about?

A Xeon L5420 is $386. Hardly $19.

Value for money? Here's your Xeon. Completely trounced by the i5-2500K.

2xL5420 is around $800.
4xi5-2500K is around $800. Double the number of processors.

I work with facts, not stories.

Matthew:out
tu ne cede malis, sed contra audentior ito
syzygy
Posts: 5843
Joined: Tue Feb 28, 2012 11:56 pm

Re: Strongest MPI-capable (cluster) engine?

Post by syzygy »

ZirconiumX wrote:Vincent. What the hell are you on about?

A Xeon L5420 is $386. Hardly $19.
Vincent said ebay.
Used L5420 for $15.99
Manufacturer refurbished L5420 for $24.99
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

ZirconiumX wrote:Vincent. What the hell are you on about?

A Xeon L5420 is $386. Hardly $19.

Value for money? Here's your Xeon. Completely trounced by the i5-2500K.

2xL5420 is around $800.
4xi5-2500K is around $800. Double the number of processors.

I work with facts, not stories.

Matthew:out
http://www.ebay.com/itm/Intel-Xeon-L542 ... 20c959b6c1

$19.90 for each one and it's not like they offer '1' or so. They offer them by the THOUSANDS at the same time. If you want them for $15 and say 8 of them, you can easily bid that and will GET it for that.

Motherboards with 2 sockets are there for $50-$60 on ebay.
That's 2 cpu's + motherboard for under $100..

Passive heatsinks by the way are under $20 for it, yes WITH heatpipes. 1 fan of 14 CM on top of it is what i do over here. Works well. Actually 12 CM is a tad better and then a tiny fan onto the RAM.

Now the speed comparision. i5-2500k is 3.3Ghz. That's a lot.
Overclocked to 3.7Ghz and really well cooled it's getting for Diep see benchmark here:

http://www.lostcircuits.com/mambo//inde ... itstart=13

1.08M nps = 1.1M nps

The 8 cores L5420 is getting high in the 1.4M nps, rounded off it's 1.5M nps.
performance difference : 1.36, of course it's rounded off, but if we cut the decimals it's 40% difference.

If we do a simple comparision:

i5 - 2500k ==> 3.7Ghz * 4 = 14.8Ghz
L5420 ==> 2.5ghz * 8 = 20Ghz

difference: 20 / 14.8 = 1.35

The i5-2500k is not better than a core2 in any benchmark for Diep. It's just not a good chip at all and realize it HAS been overclocked here from 3.3Ghz to 3.7Ghz. So that's an overclock of 12%.

Price of i5-2500k, cheapest offer i see on ebay is $175 but you can only get 1 for that, not build a cluster with it.

Cheapest offer for a lot is http://www.ebay.com/itm/Intel-Core-i5-2 ... 1e71621a22

$210.95 a piece.

That's too high of a price for a cheap cluster, and its performance above is based upon overclocking.

The idea of clustering is lots of machines at a cheap price which combined have formidable crunching power.

Now not so relevant here is the discussion about ECC i assume. So let's not put too much effort into it - yet i do want to note that the Xeons here have ECC ram and i5 doesn't. If i build a cluster i first look at the budget. If difference between a small cluster with ECC and without is small, i'll look what i want to run on it. But for larger clusters it's not an issue you know.

ECC is a requirement there.
Adam Hair
Posts: 3226
Joined: Wed May 06, 2009 10:31 pm
Location: Fuquay-Varina, North Carolina

Re: Strongest MPI-capable (cluster) engine?

Post by Adam Hair »

diep wrote:
ZirconiumX wrote:Vincent. What the hell are you on about?

A Xeon L5420 is $386. Hardly $19.

Value for money? Here's your Xeon. Completely trounced by the i5-2500K.

2xL5420 is around $800.
4xi5-2500K is around $800. Double the number of processors.

I work with facts, not stories.

Matthew:out
http://www.ebay.com/itm/Intel-Xeon-L542 ... 20c959b6c1

$19.90 for each one and it's not like they offer '1' or so. They offer them by the THOUSANDS at the same time. If you want them for $15 and say 8 of them, you can easily bid that and will GET it for that.

Motherboards with 2 sockets are there for $50-$60 on ebay.
That's 2 cpu's + motherboard for under $100..

Passive heatsinks by the way are under $20 for it, yes WITH heatpipes. 1 fan of 14 CM on top of it is what i do over here. Works well. Actually 12 CM is a tad better and then a tiny fan onto the RAM.

Now the speed comparision. i5-2500k is 3.3Ghz. That's a lot.
Overclocked to 3.7Ghz and really well cooled it's getting for Diep see benchmark here:

http://www.lostcircuits.com/mambo//inde ... itstart=13

1.08M nps = 1.1M nps

The 8 cores L5420 is getting high in the 1.4M nps, rounded off it's 1.5M nps.
performance difference : 1.36, of course it's rounded off, but if we cut the decimals it's 40% difference.

If we do a simple comparision:

i5 - 2500k ==> 3.7Ghz * 4 = 14.8Ghz
L5420 ==> 2.5ghz * 8 = 20Ghz

difference: 20 / 14.8 = 1.35

The i5-2500k is not better than a core2 in any benchmark for Diep. It's just not a good chip at all and realize it HAS been overclocked here from 3.3Ghz to 3.7Ghz. So that's an overclock of 12%.

Price of i5-2500k, cheapest offer i see on ebay is $175 but you can only get 1 for that, not build a cluster with it.

Cheapest offer for a lot is http://www.ebay.com/itm/Intel-Core-i5-2 ... 1e71621a22

$210.95 a piece.

That's too high of a price for a cheap cluster, and its performance above is based upon overclocking.

The idea of clustering is lots of machines at a cheap price which combined have formidable crunching power.

Now not so relevant here is the discussion about ECC i assume. So let's not put too much effort into it - yet i do want to note that the Xeons here have ECC ram and i5 doesn't. If i build a cluster i first look at the budget. If difference between a small cluster with ECC and without is small, i'll look what i want to run on it. But for larger clusters it's not an issue you know.

ECC is a requirement there.
It appears that it might be cheaper to buy used servers rather than buying the parts separately:

http://www.ebay.com/itm/SuperMicro-1U-S ... 6rk%3D1%26

$205.00 each.

Of course, shipping plays a role also.
User avatar
jshriver
Posts: 1371
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: Strongest MPI-capable (cluster) engine?

Post by jshriver »

diep wrote: The idea of clustering is lots of machines at a cheap price which combined have formidable crunching power.
Mental note on my next cluster build I need to ask you for help :)

Then again, I'm still content with Olympus One but some Xeons would be nice over the aging P4 system.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

jshriver wrote:
diep wrote: The idea of clustering is lots of machines at a cheap price which combined have formidable crunching power.
Mental note on my next cluster build I need to ask you for help :)

Then again, I'm still content with Olympus One but some Xeons would be nice over the aging P4 system.
I probably also will put some text within not too many months online how to build a cluster and especialy how to cool it and keep it quiet.

For a single machine you have many options. doing it dirt cheap for a cluster is not so easy.

Those ready $200 rackmounts, note i also saw them for $150, they make HUGE NOISE.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Strongest MPI-capable (cluster) engine?

Post by abulmo »

diep wrote:The i5-2500k is not better than a core2 in any benchmark for Diep.
I guess your mileage may vary. I am not a Chess but an Othello programmer, and when switching from a core2 Q9650 to a Sandy Bridge (i7-2600K), I have been really impressed by the speed improvement. One of my speed test dropped from 5'15s to 3'30s, both CPU running at 3.6 Ghz. Move generation at Othello is somewhat similar to generating attacks by a Queen in Chess, so I suppose bitboard based Chess programs should benefit of the Sandy/Ivy Bridge Architecture.
Richard
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Strongest MPI-capable (cluster) engine?

Post by diep »

abulmo wrote:
diep wrote:The i5-2500k is not better than a core2 in any benchmark for Diep.
I guess your mileage may vary. I am not a Chess but an Othello programmer, and when switching from a core2 Q9650 to a Sandy Bridge (i7-2600K), I have been really impressed by the speed improvement. One of my speed test dropped from 5'15s to 3'30s, both CPU running at 3.6 Ghz. Move generation at Othello is somewhat similar to generating attacks by a Queen in Chess, so I suppose bitboard based Chess programs should benefit of the Sandy/Ivy Bridge Architecture.
It is the same cpu core from game tree search viewpoint seen. So your code should be equally fast.

Only difference is a built in memory controller.

That should be a few percent to you, not factor 2 difference in speed.

Of course running at 1 core, as you compare a 8 thread cpu with 4 now.
abulmo
Posts: 151
Joined: Thu Nov 12, 2009 6:31 pm

Re: Strongest MPI-capable (cluster) engine?

Post by abulmo »

diep wrote:It is the same cpu core from game tree search viewpoint seen. So your code should be equally fast. Only difference is a built in memory controller. That should be a few percent to you, not factor 2 difference in speed.


When reading technical articles, there are more differences, some coming from intermediate CPU architecture.
* µop cache (faster instruction bandwitdh a)
* better branch prediction unit
* New instructions (popcount makes my program 5% faster).
* faster memory (DDR3 vs DDR2).
* built-in system agent
* hyperthreading
* etc.
diep wrote:Of course running at 1 core, as you compare a 8 thread cpu with 4 now.

Of course not. A fair comparison is not to disable half of the capabilities of the sandy bridge. Both CPU are 4 cores. One can support 8 thread, the other not. The fair comparaison is 8 threads against 4 threads.That say, HT acceleration is only 20% (vs 75% if using 8 real cores).

There are many small improvements that add up to make the CPU running my program 50% faster.
Richard