You're doing a statement that has no logical basis.bob wrote:Vincent, any _good_ compiler person could explain this to you. The idea is to (a) optimize branch prediction and (b) optimize the prefetch that occurs in cache blocks.diep wrote:In Diep's hashtable code in summer 2008 i rewrote some code in order to speed it up. I was very amazed to see Diep actually SLOWER with the code at AMD.
So for once i took a look in the assembler output.
Here is the C code of the code fragment:
(Everything is 32 bits signed or unsigned integers here)
Code: Select all
Hashtable code: .. deltax = 0; if( bestscore >= MATEVALUE-1000 ) deltax = realply; if( bestscore <= -MATEVALUE+1000 ) deltax = -realply; (a few lines more code) } // end of function
I was using the latest released GCC compiler at the time which i had just downloaded hours before doing this compile. Whatever flag i used.
gcc -O3 -march=k8 -mtune=k8 -S -DUNIXPII -c diepab.c
Other flags like -O2 and no tuning to k8 but to k7, it all didn't help.
Each time the above code fragment produced.
Next is what it generated. It first jumps away regurarly to label L837,
then it executes 4 instructions there and jumps back. I don't want it to
jump around in the code. I want fall through code, that's lightyears faster at AMD. How they get the bizarre idea to generate this type of code i do not know.
If you do this:
if (c) {
rarely executed code
}
Where is your proof the code gets executed rarely if the optimizations happen without PGO?
What we do know is that AMD suffers a bunch of penalties if we jump around in the code, whereas 'fall through' works perfectly.