Uri Blass wrote:My mistake is that I simply did not know that using / has different meaning than using >>
I thought that it is simply the same and this is the reason that I even did not think about using / for getting symmetric evaluation.
In case of knowing that / make things symmetric I could simply use it without asking questions because I had no reason to believe that / can be significantly slower than >>(I trust the compiler to make good work for every function).
I have some /24 in my code and I never cared about speed of these cases
I used >>3 instead of /8 simply because I thought it is the same and did not care if I write >>3 and not /8
Even worse - in C arithmetical shift with signed int is not specified (even twos-complement might not be used for signed int types!). It might be compiler and/or target architecture depending.
With todays processors and compilers I guess this expression is almost ever true:
assert (-1 >> 1 == ~1 + 1)
If some rare systems have only logical shift right, so that always zeros are shifted in from left to right, one has to use something like this (adapted from http://swox.com/~tege/divcnst-pldi94.pdf
Code: Select all
int shiftArithmeticalRight(int x, int s)
unsigned int u = x ^ 0x80000000;
return (u >> s) - (1 << (s^31));
with K > 0
is only true, if the division goes without remainder, that is the i least significant bits are zero.
Otherwise shift rounds to -oo, while idiv truncates toward zero.
Many programmers are aware of some tricks to avoid very expensive instructions like division or modulo, eg. for making hash indices by "and" for power of two sized tables instead of modulo table size. Same for shift versus division by power of two divisors with unsigned values.
In the meantime recent compiler aka code generators will likely use similar algorithms like I posted from the amd-manual - to replace division by invariant integers with reciprocal imul or sar.
Therefor it is recommend to use "/" by constants rather than own shift- or whatever tricks. But i think it does not hurt to be aware of what the compiler does - and to be aware of div/idiv is really expensive - to better avoid variable divisors in critical code.
Btw. K10 idiv-latency is 23 + number of significant bits in absolute value of the dividend. 32*32=64bit imul still takes 3 cycles.