Developments of the last two years

hgm · Post by **hgm** » Sun Mar 03, 2013 7:18 pm

Pairing??? You can't mean the Pentium I concept?

Don · Post by **Don** » Sun Mar 03, 2013 7:21 pm

Joost Buijs wrote:
Don wrote:
Joost Buijs wrote:
Rebel wrote: Using inline 64-bit ASM on the time critical parts should give a skilled ASM programmer a 20-25% speed-up.
Maybe you are talking about the past?
My experience is that the code-generators/optimizers of modern C/C++ compilers are so good that it is very difficult or even impossible to produce something faster with handwritten assembler.
You probably read that somewhere and are just repeating it as this is one of those often repeated myths that you can see all over the place on the web.

What is probably true is that most people cannot write better assembly than a good compiler. But someone with real skill who understand the issues and how modern CPU's work can write better code than a dumb program can.
I did not read this anywhere, I actually tried it myself. I'm not a skilled assembler programmer, but I had my share of assembler programming in the past, that is probably more than you can say.

I wrote an entire chess engine in assembler several years ago, so I understand the concepts and principles.

The only reason a compiler can SOMETIMES write good assembler is because a human who understands what it takes designed it. I stand by my statement that a human expert in assembler can do a better job.

I am not claiming that YOU can do a better job or even most people or even myself.

hgm · Post by **hgm** » Sun Mar 03, 2013 7:31 pm

I don't think that reasoning is sound. I can build an engine that plays better Chess than I do. Why shouldn't I be able to write a compiler that makes better assembly than I do? Even if I would be the worlds's number one expert. We have engines that beat the World champ Chess...

Don · Post by **Don** » Sun Mar 03, 2013 7:38 pm

hgm wrote:I don't think that reasoning is sound. I can build an engine that plays better Chess than I do. Why shouldn't I be able to write a compiler that makes better assembly than I do? Even if I would be the worlds's number one expert. We have engines that beat the World champ Chess...

The reasoning is that a human cannot beat a compiler unless he has the same knowledge that a compiler has. The good compilers "understand" many of the important details of instruction scheduling. That can be worked out by hand and improved upon (because compilers STILL do not do it very well.) Chess is a different task which requires massive calculation.

ZirconiumX · Post by **ZirconiumX** » Sun Mar 03, 2013 7:42 pm

hgm wrote:I don't think that reasoning is sound. I can build an engine that plays better Chess than I do. Why shouldn't I be able to write a compiler that makes better assembly than I do? Even if I would be the worlds's number one expert. We have engines that beat the World champ Chess...

Well, XBoard would be much faster

Matthew:out

Don · Post by **Don** » Sun Mar 03, 2013 7:54 pm

hgm wrote:I don't think that reasoning is sound. I can build an engine that plays better Chess than I do. Why shouldn't I be able to write a compiler that makes better assembly than I do? Even if I would be the worlds's number one expert. We have engines that beat the World champ Chess...

There are some examples on the web of fairly simply programs being optimized in assembler to get a 2x speedup. The truth of the matter is that compilers are much better than most of us, but people that really understand the low level details, specifically compiler writers themselves, can do a much better job.

The idea that human cannot match the performance of compilers is a myth that has been propagated by being uttered too many times and people repeat it like a mantra.

I found several examples on the web. Here is one:

http://www.codinghorror.com/blog/2008/0 ... -code.html

There are several dark corners where compilers have no clue but people do.

Look at some of the writings of Michael Abrash and I think you will change your tune - your thinking on this is not really your thinking, it's someone else's thinking. Don't believe things just because they are in print.

Don

hgm · Post by **hgm** » Sun Mar 03, 2013 7:56 pm

Don wrote:The reasoning is that a human cannot beat a compiler unless he has the same knowledge that a compiler has. The good compilers "understand" many of the important details of instruction scheduling. That can be worked out by hand and improved upon (because compilers STILL do not do it very well.) Chess is a different task which requires massive calculation.

But that is exactly my point: current CPUs are so complex that it would be a computationally intensive task to produce optimum code. Elementary operations that describe a task can be ordered in combinatorially many sequences. It is far from trivial how each of these perform; that would require complex simulation of the core. And what is worse, the knowledge to do such a simulation is often jealously guarded company secret. (But compiler writers that are associated with the CPU manifacturer do have access to it.)

In the Pentium-II days I did try to optimize an FFT routine. This pretty much proved a hopeless task. Many of the bottle-necks in the CPU were completely undocumented. Many of them at the beginning of the pipeline could be discovered by reverse engineering. But I always ran into the situation where designing the code so well that it would perfectly satisfy all known restrictions, there was a disastrous collapse of performance. It was very easy to calculate how many MFLOPS you could in theory do, based on the throughput of multiplier and adder units, and the 3-uOps/cycle capacity of the retirement unit. Unoptimized code woul reach about 50% of that. Well-optimized code could reach 75%, and you could always point out what the bottleneck was that caused the pipeline bubbles. Perfectly optimized code would then achieve only 25%...

Don · Post by **Don** » Sun Mar 03, 2013 8:02 pm

hgm wrote:
Don wrote:The reasoning is that a human cannot beat a compiler unless he has the same knowledge that a compiler has. The good compilers "understand" many of the important details of instruction scheduling. That can be worked out by hand and improved upon (because compilers STILL do not do it very well.) Chess is a different task which requires massive calculation.
But that is exactly my point: current CPUs are so complex that it would be a computationally intensive task to produce optimum code.

It't true that they are much more complex than they used to be, but they still follow simple rules and people who really understand the low level details can out-code a compiler. Very few people will bother to try because they are good enough and in fact probably much better than most of us - who probably know almost nothing about instruction scheduling issues, register utilization and other gory low level details. So may to YOU and I it seems hopelessly complicated, but to a beginner at chess the moves of a master seem hopefully out of reach to them.

But maybe more to the point is that compiler are downright stupid about many things. They cannot figure out everything and perhaps the write assembly like a chess program plays very closed positions, with a great deal of naivety. And yet even in a closed position a chess program will probably beat you and I, but not a really good player.

Elementary operations that describe a task can be ordered in combinatorially many sequences. It is far from trivial how each of these perform; that would require complex simulation of the core. And what is worse, the knowledge to do such a simulation is often jealously guarded company secret. (But compiler writers that are associated with the CPU manifacturer do have access to it.)

In the Pentium-II days I did try to optimize an FFT routine. This pretty much proved a hopeless task. Many of the bottle-necks in the CPU were completely undocumented. Many of them at the beginning of the pipeline could be discovered by reverse engineering. But I always ran into the situation where designing the code so well that it would perfectly satisfy all known restrictions, there was a disastrous collapse of performance. It was very easy to calculate how many MFLOPS you could in theory do, based on the throughput of multiplier and adder units, and the 3-uOps/cycle capacity of the retirement unit. Unoptimized code woul reach about 50% of that. Well-optimized code could reach 75%, and you could always point out what the bottleneck was that caused the pipeline bubbles. Perfectly optimized code would then achieve only 25%...

Rebel · Post by **Rebel** » Sun Mar 03, 2013 9:17 pm

Don wrote: Look at some of the writings of Michael Abrash and I think you will change your tune

Precisely, the man has been my teacher

Gerd Isenberg · Post by **Gerd Isenberg** » Sun Mar 03, 2013 10:47 pm

Rebel wrote:
Don wrote: Look at some of the writings of Michael Abrash and I think you will change your tune
Precisely, the man has been my teacher

Hey, you never told us before

Thanks to Don for link and hint.

Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years

Re: Developments of the last two years