No I didn'tDesperado wrote: or did you already test it ?

...and you ?
P.S: I am not very fond of assembler because MSVC and gcc have different syntax so you end up with different versions for different compilers and if you do that there should be a very good reason to do it, read at least 5% increase in speed but I bet with you that the above, with all the added ancillary stuff bb!=0 and bit reset, is not 5% faster then mine (I think it is not faster at all because there are two bsf for 1 multiply + one branch).