Ivy Bridge vs Sandy Bridge for computer chess

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

lkaufman
Posts: 5942
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by lkaufman »

syzygy wrote:
lkaufman wrote:
syzygy wrote:
lkaufman wrote:The new machine is definitely faster than the old by more than the 2.6 to 2.2 GHz, so something has been improved.
Given that the old one is 1.5 years old and is dual socket, it cannot be Sandybridge. So the old one is Westmere-EP / Gulftown. Sandybridge gave significant performance improvements over Westmere (more than IB over SB).
So which technology would be in my standard 4 core I7 from about 2 1/2 years ago?
Westmere-based processors were launched January 7, 2010, but if this overview is correct, there is no regular 4 core i7 based on Westmere. So this 4 core i7 should be based on Nehalem (either Lynnfield or Bloomfield). See also here.
So this means that the poor performance of Komodo is limited to the machine with the Westmere technology; on the machines with the Nehalem and with the Sandy bridge technology it runs fine. Can you think of any sensible explanation for this?
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by syzygy »

lkaufman wrote: So this means that the poor performance of Komodo is limited to the machine with the Westmere technology; on the machines with the Nehalem and with the Sandy bridge technology it runs fine. Can you think of any sensible explanation for this?
I'm not surprised that Sandybridge performs better than older microarchitectures. I have no explanation for why the older Nehalem processour would do better than a Westmere processor (on Komodo). But the performance of a computer is determined by more factors than the processor microarchitecture. Much depends also on the method of testing, especially if you want to avoid comparing apples to oranges.
lkaufman
Posts: 5942
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by lkaufman »

syzygy wrote:
lkaufman wrote: So this means that the poor performance of Komodo is limited to the machine with the Westmere technology; on the machines with the Nehalem and with the Sandy bridge technology it runs fine. Can you think of any sensible explanation for this?
I'm not surprised that Sandybridge performs better than older microarchitectures. I have no explanation for why the older Nehalem processour would do better than a Westmere processor (on Komodo). But the performance of a computer is determined by more factors than the processor microarchitecture. Much depends also on the method of testing, especially if you want to avoid comparing apples to oranges.
Well, all tests were run in the same way, in particular the number of tests was always twice the number of cores so as to make full use of hyperthreading. Is it plausible that simply using a machine with two physical processors (the 12 and 16 core machines) would affect different engines in a very different way than using just one?
This is a big issue for us as it casts a cloud over results obtained by combining tests on different hardware. Previously we thought this was a minor issue, but these results make it seem rather significant.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by Rein Halbersma »

lkaufman wrote: So this means that the poor performance of Komodo is limited to the machine with the Westmere technology; on the machines with the Nehalem and with the Sandy bridge technology it runs fine. Can you think of any sensible explanation for this?
The Intel development process follows a "tick-tock" pendulum process: http://en.wikipedia.org/wiki/Intel_Tick-Tock
http://en.wikipedia.org/wiki/List_of_In ... processors

Your dual E5-2670 is on the "tick" of the Sandy Bridge branch. The Ivy Bridge corresponds to the "tock" (22nm instead of 32nm, but of the same architecture). Only single socket E3-series Ivy Bridge have been released so far (with quad-cores as the high-end versions), with the dual/higher sockets coming next year. Your old dual socket six-cores were probably Xeon E56xx. They were on the "tock" (Westmere) branch of the older Nehalem architecture. Let's assume it were two E5649s, which came out in Feb 2011. A big difference with your current E5-2670s is the memory bandwith: max 51 Gb/s vs 32 Gb/s. Part of that is because the newer CPUs support 4 instead of 3 memory channels.

It's pure speculation, but my guess is that the copy-make approach of Komodo (which is pretty rare among chess engines, but I also use it) is heavier on the memory bandwidth than the usual make-undo approach. BTW, I envy your 16-core workhorse, I'm currently assembling a single-socket 6-core E5-1650 machine which comes in just under my budget of $2K. :)
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by syzygy »

lkaufman wrote:Well, all tests were run in the same way, in particular the number of tests was always twice the number of cores so as to make full use of hyperthreading. Is it plausible that simply using a machine with two physical processors (the 12 and 16 core machines) would affect different engines in a very different way than using just one?
What do you get if you benchmark a single instance of single-threaded Komodo on the 4, 12 and 16 core machines, then do the same for a single instance of another (single-threaded) engine? Are the ratios for the two engines similar, or is Komodo still relatively slower on the 12 core machine?
lkaufman
Posts: 5942
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by lkaufman »

Rein Halbersma wrote:
lkaufman wrote: So this means that the poor performance of Komodo is limited to the machine with the Westmere technology; on the machines with the Nehalem and with the Sandy bridge technology it runs fine. Can you think of any sensible explanation for this?
The Intel development process follows a "tick-tock" pendulum process: http://en.wikipedia.org/wiki/Intel_Tick-Tock
http://en.wikipedia.org/wiki/List_of_In ... processors

Your dual E5-2670 is on the "tick" of the Sandy Bridge branch. The Ivy Bridge corresponds to the "tock" (22nm instead of 32nm, but of the same architecture). Only single socket E3-series Ivy Bridge have been released so far (with quad-cores as the high-end versions), with the dual/higher sockets coming next year. Your old dual socket six-cores were probably Xeon E56xx. They were on the "tock" (Westmere) branch of the older Nehalem architecture. Let's assume it were two E5649s, which came out in Feb 2011. A big difference with your current E5-2670s is the memory bandwith: max 51 Gb/s vs 32 Gb/s. Part of that is because the newer CPUs support 4 instead of 3 memory channels.

It's pure speculation, but my guess is that the copy-make approach of Komodo (which is pretty rare among chess engines, but I also use it) is heavier on the memory bandwidth than the usual make-undo approach. BTW, I envy your 16-core workhorse, I'm currently assembling a single-socket 6-core E5-1650 machine which comes in just under my budget of $2K. :)
Thanks, that sounds very plausible. But would this explain why Komodo looked relatively good on the old Nehalem quad i7 as well as the new machine, but not on the one in-between?
Regarding my old twelve core, I believe you are correct that it was E56xx, whichever one was 2.2 GHz.
Finally, can you think of any plausible ways to make a chess program benefit from the new architectures, aside from what we already do?
lkaufman
Posts: 5942
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by lkaufman »

syzygy wrote:
lkaufman wrote:Well, all tests were run in the same way, in particular the number of tests was always twice the number of cores so as to make full use of hyperthreading. Is it plausible that simply using a machine with two physical processors (the 12 and 16 core machines) would affect different engines in a very different way than using just one?
What do you get if you benchmark a single instance of single-threaded Komodo on the 4, 12 and 16 core machines, then do the same for a single instance of another (single-threaded) engine? Are the ratios for the two engines similar, or is Komodo still relatively slower on the 12 core machine?
When we did this, the ratios were pretty constant. In fact Komodo actually ran slightly BETTER on the 12 core relative to the other engines when tested this way. What does that suggest?
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by Rein Halbersma »

syzygy wrote:
lkaufman wrote:Well, all tests were run in the same way, in particular the number of tests was always twice the number of cores so as to make full use of hyperthreading. Is it plausible that simply using a machine with two physical processors (the 12 and 16 core machines) would affect different engines in a very different way than using just one?
What do you get if you benchmark a single instance of single-threaded Komodo on the 4, 12 and 16 core machines, then do the same for a single instance of another (single-threaded) engine? Are the ratios for the two engines similar, or is Komodo still relatively slower on the 12 core machine?
A thing to test if you suspect memory bandwidth problems would be increasing the number of instances of a single-threaded Komodo running simultaneously from K = 1, ... to N on an N-core machine and see if the NPS of each process drops significantly as the K approaches N.
syzygy
Posts: 5554
Joined: Tue Feb 28, 2012 11:56 pm

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by syzygy »

lkaufman wrote:When we did this, the ratios were pretty constant. In fact Komodo actually ran slightly BETTER on the 12 core relative to the other engines when tested this way. What does that suggest?
It might be that Komodo is more memory-bandwidth hungry, as Rein speculates. Turboboost kicking or not kicking in can also make a difference.
Rein Halbersma
Posts: 741
Joined: Tue May 22, 2007 11:13 am

Re: Ivy Bridge vs Sandy Bridge for computer chess

Post by Rein Halbersma »

lkaufman wrote: Thanks, that sounds very plausible. But would this explain why Komodo looked relatively good on the old Nehalem quad i7 as well as the new machine, but not on the one in-between?
Best guess? With only 4 cores for the i7, memory bandwidth is not as easily exhausted as with 12 cores, the older technology of the latter not withstanding.
Regarding my old twelve core, I believe you are correct that it was E56xx, whichever one was 2.2 GHz.
Finally, can you think of any plausible ways to make a chess program benefit from the new architectures, aside from what we already do?
Heh, I guess you should ask this question to people like Gerd Isenberg, who know all the new instruction sets that can be used with the newer architectures. But in general, I would think that exploiting vectorization and data parallelism (SIMD and GPGPU!) are the way to go, so you might want to have the latest and greatest compiler technology to make that work (Microsoft C++ AMP, Intel 13.0 Beta etc.) without having to get a degree in electrical engineering :-)