So for once i took a look in the assembler output.
Here is the C code of the code fragment:
(Everything is 32 bits signed or unsigned integers here)
Code: Select all
Hashtable code:
..
deltax = 0;
if( bestscore >= MATEVALUE-1000 )
deltax = realply;
if( bestscore <= -MATEVALUE+1000 )
deltax = -realply;
(a few lines more code)
} // end of function
I was using the latest released GCC compiler at the time which i had just downloaded hours before doing this compile. Whatever flag i used.
gcc -O3 -march=k8 -mtune=k8 -S -DUNIXPII -c diepab.c
Other flags like -O2 and no tuning to k8 but to k7, it all didn't help.
Each time the above code fragment produced.
Next is what it generated. It first jumps away regurarly to label L837,
then it executes 4 instructions there and jumps back. I don't want it to
jump around in the code. I want fall through code, that's lightyears faster at AMD. How they get the bizarre idea to generate this type of code i do not know.
At intel such jumps are a lot cheaper than at AMD. Jumps with a bunch of instructions in between are ugly slow at k8. Short jumps of a few instructions, AMD is a lot faster. When optimizing some code a few years ago i had noticed that bigtime.
Note each time i take a look, over the past years, i see the same pattern repeat. I try to make code faster in C, and compiler screws up. It is just not funny.
What i do not know is whether 40 instructions are just within lookahead buffer of the 72 bytes core2, i'm sure they are out of range to AMD which has a shorter lookahead than intel.
I just wanted 2 CMOV's. In fact compiler could have reordered the CMOV's to not be within the same 16 bytes. Thing is, it is generating the below crap code:
Code: Select all
cmpl $498999, 28(%ebp)
movl 12(%ebp), %eax
jle L837
L822:
movl 28(%ebp), %edx
...
here are another 35 lines of assembler
...
orl %ecx, -68(%ebp)
xorl -68(%ebp), %edx
movl %edx, 16(%edi)
addl $68, %esp
popl %ebx
popl %esi
popl %edi
leave
ret
L836:
xorl %eax, %eax
cmpl 44(%ebp), %edx
setl %al
incl %eax
movl %eax, -48(%ebp)
jmp L801
L837:
negl 12(%ebp)
xorl %eax, %eax
cmpl $-498999, 28(%ebp)
cmovl 12(%ebp), %eax
jmp L822
L814:
movl 36(%ebp), %ecx
cmpl %ecx, %edx
movl %ecx, -16(%ebp)
jbe L817
movl -28(%ebp), %eax
movl %edx, -16(%ebp)
andl $-131072, %eax
movl %eax, -44(%ebp)
jmp L817
Vincent