Ivy Bridge vs Sandy Bridge for computer chess

syzygy · Post by **syzygy** » Fri Sep 21, 2012 11:54 pm

diep wrote:It seems though ivy bridge cpu's are less consistent in possibility to overclock than sandy bridge, unless you manage to get a good chip and modify the cpu's heatspreader. That would make overclocking ivy bridge a lot more complicated. Any experience there?

I don't have experience with Ivybridge, but while they run hotter, their critical temperature is also higher. As you already mentioned, the reason that they run hotter is Intel's use of thermal paste instead of fluxless solder between the chip and the heat spreader. The higher temperature of the cores therefore does not mean that IB chips produce more heat than SB chips. In fact, they are more energy-efficient and therefore produce less heat. Cooling IB chips (i.e. reaching a stable situation in which as much heat is produced as removed from the system) is therefore in fact easier than with SB chips. One only has to accept the fact that core temperatures are higher (or modify / hexedit whatever utility one is using to read out those temperatures).

Joost Buijs · Post by **Joost Buijs** » Sat Sep 22, 2012 7:37 am

diep wrote:What sort of cooling do you use Joost?

Over here i'm aircooling under which 100 CFM fans for the clusternodes, and heatsinks with heatpipes, yet no liquid/water cooling.

What frequency do you run them on at tournaments?

It seems though ivy bridge cpu's are less consistent in possibility to overclock than sandy bridge, unless you manage to get a good chip and modify the cpu's heatspreader. That would make overclocking ivy bridge a lot more complicated. Any experience there?

Hi Vincent,

Over here I'm also using aircooling, the use of watercooling is not worthwhile IMHO. I use big aircoolers like Noctua NH-D14 and Thermalright Silver Arrow.

The Nehalem runs stable at 4.0 GHz without overvoltage.
The Sandy-Bridge runs stable at 4.8 GHz, but then it gets very hot (about 90C), so I usually run it at 4.4 GHz, at this speed the core temperature stays below 70C.

For chess the performance of Nehalem and Sandy-Bridge is about the same, at least for my program this is the case, for others I don't know.

I bought an Ivy-Bridge, just because I was curious about what it could do. It was a very big disappointment. It runs very hot because of it's shitty heatspreader design, and performance wise it is not any better then a 2700K. So I'll stick with Sandy-Bridge for a while.

I,m thinking about building a small cluster with 2700K nodes, maybe 4 nodes (16 cores) to start with, and I can always add the two 6 core boxes for a total of 28 cores.

diep · Post by **diep** » Sat Sep 22, 2012 10:41 am

Joost Buijs wrote:
diep wrote:What sort of cooling do you use Joost?

Over here i'm aircooling under which 100 CFM fans for the clusternodes, and heatsinks with heatpipes, yet no liquid/water cooling.

What frequency do you run them on at tournaments?

It seems though ivy bridge cpu's are less consistent in possibility to overclock than sandy bridge, unless you manage to get a good chip and modify the cpu's heatspreader. That would make overclocking ivy bridge a lot more complicated. Any experience there?
Hi Vincent,

Over here I'm also using aircooling, the use of watercooling is not worthwhile IMHO. I use big aircoolers like Noctua NH-D14 and Thermalright Silver Arrow.

The Nehalem runs stable at 4.0 GHz without overvoltage.
The Sandy-Bridge runs stable at 4.8 GHz, but then it gets very hot (about 90C), so I usually run it at 4.4 GHz, at this speed the core temperature stays below 70C.

the SB is an i7- 3960x ?

For chess the performance of Nehalem and Sandy-Bridge is about the same, at least for my program this is the case, for others I don't know.

I bought an Ivy-Bridge, just because I was curious about what it could do. It was a very big disappointment. It runs very hot because of it's shitty heatspreader design, and performance wise it is not any better then a 2700K. So I'll stick with Sandy-Bridge for a while.

I,m thinking about building a small cluster with 2700K nodes, maybe 4 nodes (16 cores) to start with, and I can always add the two 6 core boxes for a total of 28 cores.

With AMD non existing for now in terms of cpu performance, seems like intel can do what they want. They released ivy bridge long after it was obvious AMD has only quad core cpu's (@ 8 minicores) in desktop market and intel with its 6 core cpu's total owns those.

Intel must've known for many years already the grease trick and waited to use it when they knew AMD is dead. At release of SB they didn't know yet how strong AMD would be with bulldozer, as it released some months before bulldozer.

Additional AMD's bulldozer cpu's do not have a memory controller that scales. The memory controller can deliver ok till 4 cores but when you scale that up to 8 cores the latency to RAM deteriorates over 60% or so.

So AMD has 5 problems to solve:
- double amount of cores
- a memory controller that can serve 4x more cores than it can serve now
- eat less power (a result of using too many transistors)
- move to newer proces technologies
- their R&D has been moved to India, that's cheap yet
backfiring everywhere

Once again they focus upon real cheap cpu's.
So AMD is gonna get behind in proces technology for cpu's.

I don't see AMD solve all those problems.

Seems only thing they're busy with now which definitely will boost ipc of the minicores, is decoding more instructions per clock. That doesn't solve the other 5 problems above. Also it's unclear whether it means it really decodes more per cycle or whether this is another paper truth; usually they can fuse together 2 specific instructions and then do as if the new chip is better then - same trick like intel did with ivy bridge. Yet in computerchess we usually hardly profit from that...

So with intels new 100% monopoly after the bulldozer fiasco, i wouldn't be amazed if that's the reason they decided for this cheapskate solution for ivy bridge avoiding an easy overclock. Only some real hardcore overclockers who really modify the cpu's they will be able to overclock. In meantime intel already had moved some crucial components to inside the cpu's logic, making it impossible to modify that.

With such monopoly giving intel the choice to force you buy the most expensive cpu's, getting a high clock is going to be ever more expensive and complicated.

Clustering seems only cheapskate solution now as the low clocked cpu's go for junk prices on ebay.

diep · Post by **diep** » Sat Sep 22, 2012 11:17 am

syzygy wrote:
diep wrote:It seems though ivy bridge cpu's are less consistent in possibility to overclock than sandy bridge, unless you manage to get a good chip and modify the cpu's heatspreader. That would make overclocking ivy bridge a lot more complicated. Any experience there?
I don't have experience with Ivybridge, but while they run hotter, their critical temperature is also higher. As you already mentioned, the reason that they run hotter is Intel's use of thermal paste instead of fluxless solder between the chip and the heat spreader. The higher temperature of the cores therefore does not mean that IB chips produce more heat than SB chips. In fact, they are more energy-efficient and therefore produce less heat. Cooling IB chips (i.e. reaching a stable situation in which as much heat is produced as removed from the system) is therefore in fact easier than with SB chips. One only has to accept the fact that core temperatures are higher (or modify / hexedit whatever utility one is using to read out those temperatures).

You never overclocked something yourself obviously.

diep · Post by **diep** » Sat Sep 22, 2012 11:42 am

Joost Buijs wrote:
diep wrote:What sort of cooling do you use Joost?

Over here i'm aircooling under which 100 CFM fans for the clusternodes, and heatsinks with heatpipes, yet no liquid/water cooling.

What frequency do you run them on at tournaments?

It seems though ivy bridge cpu's are less consistent in possibility to overclock than sandy bridge, unless you manage to get a good chip and modify the cpu's heatspreader. That would make overclocking ivy bridge a lot more complicated. Any experience there?
Hi Vincent,

Over here I'm also using aircooling, the use of watercooling is not worthwhile IMHO. I use big aircoolers like Noctua NH-D14 and Thermalright Silver Arrow.

The Nehalem runs stable at 4.0 GHz without overvoltage.
The Sandy-Bridge runs stable at 4.8 GHz, but then it gets very hot (about 90C), so I usually run it at 4.4 GHz, at this speed the core temperature stays below 70C.

For chess the performance of Nehalem and Sandy-Bridge is about the same, at least for my program this is the case, for others I don't know.

I bought an Ivy-Bridge, just because I was curious about what it could do. It was a very big disappointment. It runs very hot because of it's shitty heatspreader design, and performance wise it is not any better then a 2700K. So I'll stick with Sandy-Bridge for a while.

I,m thinking about building a small cluster with 2700K nodes, maybe 4 nodes (16 cores) to start with, and I can always add the two 6 core boxes for a total of 28 cores.

as for clustering:
depends of course what you want to accomplish.
if you want to use the 4 nodes as 1 big computer then
i'm not sure the 2700k's are the right chip to do so.

the problem of the i7-2700k's is their high price.
$300 on ebay they are right now.

They deliver 4 cores. If you get them on 4.0Ghz stable not too hot,
that's 4.0 * 4 = 16Ghz agregated.

Now you'll say that's faster than a 2 x L5420 machine which
delivers a 20Ghz, yet realize over a cluster you'll lose big latency to the hashtables anyway, so you have to clever prefetch there anyway.

Realize that single L5420 machine with 2 cpu's here eats a 170 watt with diep running at 8 cores. That's including harddrive power and DVD active.
Drops to 160- otherwise.

An overclocked i7-2700k, 220 watt or so?
It's already at toms hardware running linpack 155 watt and it's 81C with stock cooler then.

For clusters price per node is important simply, assuming you can afford the power in the room when you want to toy with the cluster, as with a lot of power usage you need some additional ventilation to outside.

The L5420 clusters you can build for $150 a node right now.
If you want the faster memory controller you could go for the 5500 series Xeons 2nd hand.

Maybe there will be some cheap offers soon on ebay for 2 socket Xeon machines.

In clustering the 2 socket machines with low clocked cpu's simply are pricewise unbeatable in any segment.

The i7's simply are too expensive for cheap clusters.

Note with a small cluster it's very difficult to beat a 48 core AMD machine using the previous generation cpu's like 6180SE. Sometimes those are cheap on ebay.

Oh doh i see gone up in price now to $455 a cpu.

Get 4 of them and you have 48 core box. No simple cluster gonna beat it.

If you really want a simple cluster another idea is to just use 2 machines with 1 cable in between.

Not only that's fast way to connect 2 machines, also avoids you from buying a switch. So you could invest in a fast latency network card.

Maybe get 2 low latency cards and connect with cable.

Right now the dominant standard is infiniband. For fast latency communication would require install linux.

Doing SMP between 2 machines is relative simple still, despite the slow latency between the machines.

You'll sort it out.

Lots of possibilities.

Above 2 machines you need a switch. Usually not extremely cheap.
10 gbit type (4x) infiniband is dirt cheap right now. That's first generation infiniband.

Over factor 2 worse latency though.

The slower the latency the better it is to have many machines that are a tad lower clocked than to have just 4 that are are really high clocked.

The SMP algorithm for more machines is more complicated and it might suck at blitz, yet the global hashtable which gives an exponential speedup is easier to use then.

In itself there is no real limit in how many short messages per second you can ship at the newer infiniband types (QDR). It's in the many millions, yet those cards are more expensive to get.

$295 a card i see them now on ebay.

Sometimes there is some cheap offers. Switches now drop in price.
I see some $2300 now.

Cables are dirt cheap.

Joost Buijs · Post by **Joost Buijs** » Sat Sep 22, 2012 12:17 pm

diep wrote:the SB is an i7- 3960x ?

Yes, it is an i7-3960x, I bought this one because at that time the 3930k was not yet available.

Joost Buijs · Post by **Joost Buijs** » Sat Sep 22, 2012 12:50 pm

diep wrote: as for clustering:
depends of course what you want to accomplish.
if you want to use the 4 nodes as 1 big computer then
i'm not sure the 2700k's are the right chip to do so.

the problem of the i7-2700k's is their high price.
$300 on ebay they are right now.

They deliver 4 cores. If you get them on 4.0Ghz stable not too hot,
that's 4.0 * 4 = 16Ghz agregated.

Now you'll say that's faster than a 2 x L5420 machine which
delivers a 20Ghz, yet realize over a cluster you'll lose big latency to the hashtables anyway, so you have to clever prefetch there anyway.

Realize that single L5420 machine with 2 cpu's here eats a 170 watt with diep running at 8 cores. That's including harddrive power and DVD active.
Drops to 160- otherwise.

An overclocked i7-2700k, 220 watt or so?
It's already at toms hardware running linpack 155 watt and it's 81C with stock cooler then.

For clusters price per node is important simply, assuming you can afford the power in the room when you want to toy with the cluster, as with a lot of power usage you need some additional ventilation to outside.

The L5420 clusters you can build for $150 a node right now.
If you want the faster memory controller you could go for the 5500 series Xeons 2nd hand.

Maybe there will be some cheap offers soon on ebay for 2 socket Xeon machines.

In clustering the 2 socket machines with low clocked cpu's simply are pricewise unbeatable in any segment.

The i7's simply are too expensive for cheap clusters.

I don't think the 2700k is to expensive for a small cluster. A single node will cost you something like 500 euros including everything.
A single 2700k overclocked to 4.4 GHz. has about 45% more horsepower than a dual L5420 ticking at 2.5 GHz. For chess the difference will be even larger because you'll have more SMP overhead on a dual L5420 compared to a single 2700k.

Anyway I'm planning to make a cluster version of my program, it will make use of a small shared hastable for positions close to the root of the tree. I don't know if this will have a beneficial effect or not, but this is something I want to find out.

diep · Post by **diep** » Sat Sep 22, 2012 1:01 pm

Joost Buijs wrote:
diep wrote: as for clustering:
depends of course what you want to accomplish.
if you want to use the 4 nodes as 1 big computer then
i'm not sure the 2700k's are the right chip to do so.

the problem of the i7-2700k's is their high price.
$300 on ebay they are right now.

They deliver 4 cores. If you get them on 4.0Ghz stable not too hot,
that's 4.0 * 4 = 16Ghz agregated.

Now you'll say that's faster than a 2 x L5420 machine which
delivers a 20Ghz, yet realize over a cluster you'll lose big latency to the hashtables anyway, so you have to clever prefetch there anyway.

Realize that single L5420 machine with 2 cpu's here eats a 170 watt with diep running at 8 cores. That's including harddrive power and DVD active.
Drops to 160- otherwise.

An overclocked i7-2700k, 220 watt or so?
It's already at toms hardware running linpack 155 watt and it's 81C with stock cooler then.

For clusters price per node is important simply, assuming you can afford the power in the room when you want to toy with the cluster, as with a lot of power usage you need some additional ventilation to outside.

The L5420 clusters you can build for $150 a node right now.
If you want the faster memory controller you could go for the 5500 series Xeons 2nd hand.

Maybe there will be some cheap offers soon on ebay for 2 socket Xeon machines.

In clustering the 2 socket machines with low clocked cpu's simply are pricewise unbeatable in any segment.

The i7's simply are too expensive for cheap clusters.
I don't think the 2700k is to expensive for a small cluster. A single node will cost you something like 500 euros including everything.
A single 2700k overclocked to 4.4 GHz. has about 45% more horsepower than a dual L5420 ticking at 2.5 GHz. For chess the difference will be even larger because you'll have more SMP overhead on a dual L5420 compared to a single 2700k.

Anyway I'm planning to make a cluster version of my program, it will make use of a small shared hastable for positions close to the root of the tree. I don't know if this will have a beneficial effect or not, but this is something I want to find out.

If you don't find 500 euro a lot you can do it.
You can buy 2+ machines L5420 for 500 euro. The rackmounts are ready to use with 8 GB ram and everything including efficient psu for $150 - $200 on ebay.

I bought 8 of them.

I already experimented a lot with the slowdown versus hashtable loss last X plies. Especially on the supercomputer.

Things depend slightly on whether you use hashtable in qsearch. A fast program probably already isn't doing that. If i move diep from using a shared hashtable to a shared hashtable and doing the qsearch local on each hashtable so not global, i lose 20% time to depth.

Second loss is the compiler. I don't have a good compiler for the cluster GCC 4.7.0 is the best i have and it sucks, though it's a lot less worse than it used to be.

Loses me 11% compared to intel c++.

Then doing lookups to remote nodes is slow of course. I'll write a benchmark program for MPI (had intended do that anyway) to measure exactly the latency there.

This also loses you overhead.

(in all cases 'not hashing' means that i don't do a global lookup then, i still hash it local of course).

Not hashing last ply is a much bigger hit. Not hashing last 3 plies loses Diep about factor 3 of time to ply. It's kind of a constant loss though, branching factor is the same.

Not using global hashtable last 6 plies is around a factor 10 loss in time to depth.

With 2 nodes there is a lot of tricks to avoid most losses and limit them, yet there is some overhead if you don't use a global shared hashtable.

Clustering usually is easier to have break even when using massive amount of nodes.

On clusters/supercomputers i usually don't use hashtable globally in qsearch and keep using a global hashtable everywhere else. Still i store of course the qsearch hashtable in the normal hashtable local in each node, that still makes up a tad and limits losses.

Doing a parallel search without hashtable last few plies is very complicated.

There will be ways to avoid losing though. What's cheap on infiniband is stream entire blocks of data from 1 machine to another. The problem isn't the network.

In fact QDR infiniband has 8GB/s bandwidth and basically no machine really can handle that. Usually around a 2.7GB/s machines ugh out.

What the manufacturers write that their machine can deliver in bandwidth is theoretic numbers based upon what they can achieve within 1 component somewhere on planet Mars.

Yet still there is lots of tricks possible to avoid huge losses without doing global lookups last few plies.

Most of those algorithms are O ( n ^ 2 ) though. Easiest is get a good network and not do it.

You can generate and order moves for example awaiting a lookup.

Means that in say 6% of the cases you still get a cutoff so in 6% of cases you lost that system time of generating and ordering moves.

syzygy · Post by **syzygy** » Sat Sep 22, 2012 1:10 pm

diep wrote:You never overclocked something yourself obviously.

Funny way of admitting I have bit more understandig of thermodynamics than you.

diep · Post by **diep** » Sat Sep 22, 2012 1:21 pm

syzygy wrote:
diep wrote:You never overclocked something yourself obviously.
Funny way of admitting I have bit more understandig of thermodynamics than you.

Painful to admit something for you?

It's again another nonsense post yours about Ivy Bridge.

To overclock an ivy bridge you first need a good chip and odds you have one is small. Then you need to remove the heatspreader and replace the grease. There goes your warranty. 99.9% of the guys who overclock now a Nehalem or Sandy Bridge will not remove the heatspreader of ivy bridge, and intel knew that.

Then some components nowadays are inside the chip so you can't modify those.

Then to find out of course you don't have an Ivy Bridge that overclocks well. Variety in 22 nm is far greater than in previous proces technologies. And of course in The Netherlands you'll get one that doesn't overclock well with bigger odds than elsewhere.

A few hardware architects and hardcore overclockers will manage this, after destroying the first 2 Ivy Bridge cpu's they have in their hands.

So practical Sandy Bridge will overclock a lot better. So Ivy Bridge is a droop for the computerchess guys.

Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess

Re: Ivy Bridge vs Sandy Bridge for computer chess