Don't remember a cmovnc instruction. Is it a newer instruction? What does it do?wgarvin wrote:Michael Sherwin wrote:However, if the latest processors are faster at executing 32bit bsf/bsr, then I might be interested in testing something like the following:
note: it's been a long time since i've written assembly so I am not sure if the following code is correct. Howeever, it (the correct code) did test slower when I tried it on older machines.
Code: Select all
__inline s32 FirstBit(u64 bits) { __asm { mov ebx, 32 ; I guess that 32 is correct here mov ecx, 32 bsf ebx, dword ptr bits bsf ecx, dword ptr bits+4 shl ebx, 6 mov eax, [bsfTbl + ebx + ecx] } }
If you're willing to rely on bsf to not change the register contents when it fails, you could try something like this:
If you don't trust bsf or don't like the length of that dependence chain, then you can try something like this instead:Code: Select all
__inline s32 FirstBit(u64 bits) { __asm { mov ebx, 32 bsf ebx, dword ptr bits+4 add ebx, 32 bsf ebx, dword ptr bits } }
I didn't compile those, YMMV... they both return 64 if the bitboard is empty, if you want -1 instead you could just replace the first 32 with 0xFFFFFFDF.Code: Select all
__inline s32 FirstBit(u64 bits) { __asm { bsf ecx, dword ptr bits+4 cmovnc ecx, 32 bsf ebx, dword ptr bits lea ecx, [ecx+32] cmovnc ebx, ecx } }
Your first method looks good to me as bsf should not change the register if there are no bits set.
I was aiming for a bit more parallelism. However, your method does not need a table lookup, so I will test it.
Edit: I do not have an i7 so hopefully someone else will test it.