Ghz vs Raw processing power

jshriver · Post by **jshriver** » Sun Jul 18, 2010 5:50 pm

For a long time clock speed (mhz/ghz) was the defacto standard for how fast a processor was. P120 then P200, etc.

Then it seemed we hit a 3ghz barrier then companies started pushing 64bit briefly (AMD64) then multiple cores.

Core count aside, it seems that newer cpu's are faster than older ones even with lower clock rates. For example I have a Core 2 running at 2ghz that seems to blow my older P4 3ghz out of the water, even when running single thread 1 core tests.

Anyone know why? Seems the rule of ghz no longer matters.

rbarreira · Post by **rbarreira** » Sun Jul 18, 2010 6:36 pm

There are a ton more things which are important besides GHz:

- cache sizes which keep getting bigger (this means the CPU spends less time waiting for slow RAM to feed it data)
- more complex instructions, or instructions which process more data in the same number of cycles (things like MMX and SSE, 64-bit instead of 32-bit)
- smarter instruction reordering and branch prediction, which reduces the average amount of the CPU which is idle
- probably more...

Greg Strong · Post by **Greg Strong** » Sun Jul 18, 2010 6:38 pm

jshriver wrote:For a long time clock speed (mhz/ghz) was the defacto standard for how fast a processor was. P120 then P200, etc.

Then it seemed we hit a 3ghz barrier then companies started pushing 64bit briefly (AMD64) then multiple cores.

Core count aside, it seems that newer cpu's are faster than older ones even with lower clock rates. For example I have a Core 2 running at 2ghz that seems to blow my older P4 3ghz out of the water, even when running single thread 1 core tests.

Anyone know why? Seems the rule of ghz no longer matters.

GHz measures the number of baby-steps per second a CPU performs, but many instructions require multiple of these to complete. Until recently, multiplication and division, for example, were very expensive. But processors now require less click ticks for a calculation to be performed than they used to. So modern CPUs can do more actual work in a single tick of the clock. Also faster ram and larger caches on the CPU mean less cycles wasted waiting around for data to load. Actually, clock speed has always been a poor way to compare the actual computational power of processors. A 500 MHz Intel was not equivilant to a 500 MHz AMD, for example.

wgarvin · Post by **wgarvin** » Sun Jul 18, 2010 8:09 pm

rbarreira wrote:There are a ton more things which are important besides GHz:

- cache sizes which keep getting bigger (this means the CPU spends less time waiting for slow RAM to feed it data)
- more complex instructions, or instructions which process more data in the same number of cycles (things like MMX and SSE, 64-bit instead of 32-bit)
- smarter instruction reordering and branch prediction, which reduces the average amount of the CPU which is idle
- probably more...

Newer processor designs also fuse together related uops and execute them as a unit, increasing overall throughput. They are better at decoding several instructions at once, they have more execution units for executing several instructions at once, they have larger reorder buffers, more load/store buffers, a better balance of units for hyperthreading, etc. They have special hardware for re-executing short loops with no decoding at all. Their branch prediction is amazingly good, and pipeline latencies are shorter than they were in the bad old days of the Pentium 4. And probably lots of other little details like that.

If you want to know about gory details of the microarchitecture (or just good suggestions for optimizing), Agner Fog has a site with a lot of detailed info.

Mincho Georgiev · Post by **Mincho Georgiev** » Sun Jul 18, 2010 8:49 pm

rbarreira wrote:There are a ton more things which are important besides GHz:

- cache sizes which keep getting bigger (this means the CPU spends less time waiting for slow RAM to feed it data)
- more complex instructions, or instructions which process more data in the same number of cycles (things like MMX and SSE, 64-bit instead of 32-bit)
- smarter instruction reordering and branch prediction, which reduces the average amount of the CPU which is idle
- probably more...

Add:
improved instructions/clock cycle ratio, which is most important.

bob · Post by **bob** » Sun Jul 18, 2010 10:41 pm

jshriver wrote:For a long time clock speed (mhz/ghz) was the defacto standard for how fast a processor was. P120 then P200, etc.

Then it seemed we hit a 3ghz barrier then companies started pushing 64bit briefly (AMD64) then multiple cores.

Core count aside, it seems that newer cpu's are faster than older ones even with lower clock rates. For example I have a Core 2 running at 2ghz that seems to blow my older P4 3ghz out of the water, even when running single thread 1 core tests.

Anyone know why? Seems the rule of ghz no longer matters.

Sure. The PIV was a dog. Very long pipeline which meant a bad branch prediction killed things. Newer processors can keep more instructions issuing per clock cycle as well, and the pipes are more independent to prevent as many stalls. Then there is better cache, including adding L3 on the newest I* processors to reduce the L2 miss penalty, which used to be simply the main memory latency.

Mangar · Post by **Mangar** » Mon Jul 19, 2010 4:22 pm

Hi,

as everybody thinks computation speed is measured by clock frequency Intel had the idea to create a chip that can reach very high frequencies no matter if it really gets faster.

That GHZ is the only or the primary boost for computation speed was never true. Expecially early CPU needed many clock cycles per instruction. As far as I remember a 8088 CPU neede 3 clock cycles for a simple INC and about 130 clock cycles for a 8 Bit multiplication.
Current processors have pipelines. In ideal cases an instruction needs about 10-18 clock cycles to walk though the pipeline but on every cycle one instruction gets finished - even multiplications. Thus maybe a modern CPU is able to calculates 10 times as many instrutions per clock cycle as old Intel CPU´s.

Greetings Volker

Bo Persson · Post by **Bo Persson** » Mon Jul 19, 2010 5:31 pm

jshriver wrote:For a long time clock speed (mhz/ghz) was the defacto standard for how fast a processor was. P120 then P200, etc.

Then it seemed we hit a 3ghz barrier then companies started pushing 64bit briefly (AMD64) then multiple cores.

Core count aside, it seems that newer cpu's are faster than older ones even with lower clock rates. For example I have a Core 2 running at 2ghz that seems to blow my older P4 3ghz out of the water, even when running single thread 1 core tests.

Yes, the Pentium 4 was designed to reach as many GHz as possible, by doing very little work on each clock tick. It ALMOST reached 4 GHz before over heating.

Intel was a bit traumatized when AMD Athlon reached 1 GHz at a time when Pentium III could "only" do 966 MHz. Pentium 4 made sure that would never happen again!

hgm · Post by **hgm** » Mon Jul 19, 2010 5:39 pm

I know P-IV is not popular for Chess, but there are many tasks (in particular floating-point number crunching) for which the architecture was very suitable. Unpredictable branches killed you, for sure, but some tasks don't have many of those. Increasing parallelism by subdividing the pipeline in more stages, that can be traversed in shorter time (hence the higher clock rate) makes perfect sense for those. So writing it off as just a marketing hype is doing Intel a bit of an injustice.

wgarvin · Post by **wgarvin** » Mon Jul 19, 2010 7:18 pm

Mangar wrote:Hi,

as everybody thinks computation speed is measured by clock frequency Intel had the idea to create a chip that can reach very high frequencies no matter if it really gets faster.

That GHZ is the only or the primary boost for computation speed was never true. Expecially early CPU needed many clock cycles per instruction. As far as I remember a 8088 CPU neede 3 clock cycles for a simple INC and about 130 clock cycles for a 8 Bit multiplication.
Current processors have pipelines. In ideal cases an instruction needs about 10-18 clock cycles to walk though the pipeline but on every cycle one instruction gets finished - even multiplications. Thus maybe a modern CPU is able to calculates 10 times as many instrutions per clock cycle as old Intel CPU´s.

Greetings Volker

The 80386 required a minimum 2 cycles latency for each instruction, because it couldn't detect when an instruction's result was used in the address calculation of the very next instruction, so it always had to stall for one cycle there in case it was. The 80486 was able to detect this situation (called an AGI) and so it could usually execute 1 instruction every cycle, and if you fed it some instructions that caused an AGI it would delay the dependent instruction by a few cycles until the result was ready. The next generation was the Pentium, which had two pipelines (the U pipe and V pipe) and issued instructions alternately to each one, but the V pipe could only handle simple instructions. So it could execute 2 instructions per cycle in well-scheduled code, but an average like 1.5 was more typical. All of those were in-order designs.

Then came the Pentium Pro and Pentium II, the first out-of-order designs from Intel. These could decode, execute and retire multiple instructions per cycle, if the instructions were scheduled properly to make them happy. Pentium III was a slight evolution of that. Pentium 4 was quite a different architecture, that had high clock rates and high throughput of certain simple operations, but it had a very long pipeline and long penalties for things that disrupted the pipeline (like branch mispredictions or partial register stalls), so overall performance was not that great for a lot of kinds of apps. After that they went back to the architecture of the Pentium III and made a new and improved version of it which became Core, and after that Core2. The current designs can decode, execute and retire several instructions per cycle. I'm less familiar with the history of AMD's chips but theirs is a similar story of lots of incremental improvements.

The competition between the two companies pushed them to keep innovating and improving their designs; the chips we have today are the result of a lot of very clever ideas and lots of engineering effort. Looking at the history of these chips, the evolution from simple designs 20+ years ago to where we are today, its pretty impressive.

In 1979, an Intel 8088 chip contained about 29,000 transistors. By 1985, the 80386 design consisted of about 275,000 transistors. A Pentium III in 1999 had about 9.5 million transistors in it. The AMD K8 had about 106 million transistors in 2003. A Core 2 Duo had about 291 million transistors in 2006. Today, a Core i7 with six cores contains about 1,170 million transistors! A lot of these transistors are cache SRAM, but a good chunk of them are for the ever-more-complicated CPU logic of each generation.

By the way, this is also why the x86 instruction set has gotten "less bad" compared to RISC, over time: the transistor cost of all the front-end decoding of x86 instructions into something more RISC-y (i.e. uops) and register renaming, etc. has become a smaller and smaller fraction of the overall chip cost. An 80486 was at least 10x more complicated than an ARM chip with comparable performance, because of the x86 instruction set. But with more modern chips, the overhead (in extra transistors) of decoding x86 instructions is much lower, maybe 5% or something. The variable-sized instructions are more dense than 32-bit RISC instructions, leading to fewer cache misses, TLB misses and page faults.

Ghz vs Raw processing power

Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power

Re: Ghz vs Raw processing power