Why is Core 2 Duo prefered for chess programming?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Why is Core 2 Duo prefered for chess programming?

Post by Carey »

Why is the Core 2 Duo prefered over the Athlon / Turion X2 for chess programming?

Is it just because it has a larger cache over the Athlon X2?

With the X2's lower memory latency, I'd think that would be a big advantage over Intel's Core 2 Duo.

Are there specific instructions (bsf, etc.) that are much faster?

Is the C2D's 64 bit stuff more efficient?

The Athlon X2 supposedly has a less efficient FP & SSE unit, but I doubt that effects chess programs all that much.

I thought the X2 had better communications between cores and a more efficient memory controller, which I would have thought would be a major advantage. Especially with chess programs using so many large tables these days.

I'm just curious.

Based on price, you should be able to get a faster X2 that could make up any minor performance difference. And most of the people I've talked with about other stuff still suggest the Athlon X2.

But yet chess people still seem to prefer the Core 2 Duo.

There's bound to be a reason...
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Why is Core 2 Duo prefered for chess programming?

Post by hgm »

The Core 2 Duo has a 4-wide pipeline, vs K8 only 3-wide. The old advantage of AMD architecture, that its uOps can execute per clock could specify both an ALU operation and a memory access, and thus are really should be counted doube, (making it 6-wide) has evaporated now that Intel does the same (and calls it uOp fusion). Many key intructions used by bitboarders are now single cycle in th Core 2 Duo.

I was under the impression that in AMD multi-core chips each core has its own private L2 cache. In Core Duo and Core 2 Duo the L2 cache is shared by the two cores. This also makes for very efficient communication between the cores.

The memory controller definitely is an advantage of AMD architecture. But the FSB frequency on Core 2 s very high (1066 MHz data rate due to the quad-pumping), so with a single CPU package communication between north-bridge and CPU is not yet a bottleneck. The latency for memory access is a bit longer than with the dedicated memory controller of AMD though. But it makes little difference if a memory access slows you down by a factor of 800 or 1000, if you would have to wait for it, you are totally dead. Big tables in Chess programs are only competative if they can be mostly cached, and there the larger L2 is more likely to offer an advantage than the lower memory latency.

With larger number of cores, memory access on Core 2 Duo might become a severe bottleneck, and AMD probably has the advantage. But how many of us can afford an 8-core machine?
frankp
Posts: 228
Joined: Sun Mar 12, 2006 3:11 pm

Re: Why is Core 2 Duo prefered for chess programming?

Post by frankp »

I am thinking of upgrading my old 2.0 single cpu amd box and was going for amd64 3.0 dual core, largely based on my previous experience of amd versus P4. Are you saying that Core 2 does hardware (or very fast) popcount and identify first set bit? That would be useful for bitboarders.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Why is Core 2 Duo prefered for chess programming?

Post by Gerd Isenberg »

While on an amd 64 box one has as a hard time to traverse a bitboard with either 9-cycle bsf vector path (bsr 10 cycles) instruction or some alternative De Bruijn methods, core 2 duo has an two cycle bsf with a reciprocal throughput of one.

Popcnt (and ldzcnt) instructions are not yet available (I think) but are announced by intel together with the new 50 SSE4 instructions, and by AMD with K8L.

Agner Fog has instruction tables for all recent cpus:

http://www.agner.org/optimize/
http://www.agner.org/optimize/instruction_tables.pdf

Gerd
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Why is Core 2 Duo prefered for chess programming?

Post by hgm »

frankp wrote:I am thinking of upgrading my old 2.0 single cpu amd box and was going for amd64 3.0 dual core, largely based on my previous experience of amd versus P4. Are you saying that Core 2 does hardware (or very fast) popcount and identify first set bit? That would be useful for bitboarders.
P4 is trash. Anything would look good compared to P4, so this is really a totally meaningless comparison. Core 2 Duo for Chess seems to be at least 3 times faster than P4.
frankp
Posts: 228
Joined: Sun Mar 12, 2006 3:11 pm

Re: Why is Core 2 Duo prefered for chess programming?

Post by frankp »

Thanks Gerd. I would call this a complete answer :-)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Why is Core 2 Duo prefered for chess programming?

Post by bob »

It is simply faster in every test I have run. faster in 64 bits. Faster in 32 bits. Faster everywhere. Bigger cache. Just the best there is right now...
frankp
Posts: 228
Joined: Sun Mar 12, 2006 3:11 pm

Re: Why is Core 2 Duo prefered for chess programming?

Post by frankp »

Bob

Given my practical choice for the next machine is 3GHz amd or 2.4GHz core2 (about the same price where I buy them from), do you have any feel for whether the Core2 will still be faster (64bit mode) ie is the core2 that much better in your experience.

Frank
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Why is Core 2 Duo prefered for chess programming?

Post by bob »

A single core-2 processor (just one CPU) is more than twice as fast as a single PIV 2.8ghz xeon. My office machine using both cpus hits 1.5-2.0M nps. My 2.0ghz core-2 laptop hits 5-6M nps.

the comparison is startling...

Note that the xeon box in my office is 32 bit, while my core-2 is 64 bit and I'm running 64 bit linux (suse 10.2)...
nczempin

Re: Why is Core 2 Duo prefered for chess programming?

Post by nczempin »

bob wrote:A single core-2 processor (just one CPU) is more than twice as fast as a single PIV 2.8ghz xeon. My office machine using both cpus hits 1.5-2.0M nps. My 2.0ghz core-2 laptop hits 5-6M nps.

the comparison is startling...

Note that the xeon box in my office is 32 bit, while my core-2 is 64 bit and I'm running 64 bit linux (suse 10.2)...
Umm... forgive me if I'm missing something, but isn't it obvious that the 64-bit factor plays a role? Or is that exactly the point you were making?