mambofish wrote:Yep, I think Matt Taylor and Robert Harley were the guys who came up with this originally, back in 1996 or thereabouts.
I wasn't aware of the 64-bit magic number though. I'll see if I can get it to work, because the (int) casts in java are s l o w . It might not work though because java doesn't do unsigned arithmetic.
Its a toy language really
Java has explict logical shift right >>>
Here is Matt's first posting about his folded trick it from Jun 2003.
http://groups.google.de/group/comp.lang ... a3044afa7c
I was not aware of Robert Harley. Do you have some references?
Java 1.5 has Long.numberOfTrailingZeros, which might be the best choice - as it might use lzcnt of future intel and amd processors.
Here some Java-bitscans and some timing results by Klaus Friedel by varous JREs:
Code: Select all
/**
* @author GIsenberg
* @return index 0..63 of LS1B -1023 if passing zero
* @param b a 64-bit word to bitscan
*/
static public int bitScanForwardDbl(long b)
{
double x = (double)(b & - b);
int exp = (int) (Double.doubleToLongBits(x) >>> 52);
return (exp & 2047) - 1023;
}
Code: Select all
/**
* @author Matt Taylor
* @return index 0..63
* @param bb a 64-bit word to bitscan, should not be zero
*/
static private final int[] foldedTable = {
63,30, 3,32,59,14,11,33,
60,24,50, 9,55,19,21,34,
61,29, 2,53,51,23,41,18,
56,28, 1,43,46,27, 0,35,
62,31,58, 4, 5,49,54, 6,
15,52,12,40, 7,42,45,16,
25,57,48,13,10,39, 8,44,
20,47,38,22,17,37,36,26,
};
static public int bitScanForwardMatt(long b) {
b ^= (b - 1);
int folded = ((int)b) ^ ((int)(b >>> 32));
return foldedTable[(folded * 0x78291ACF) >>> 26];
}
Code: Select all
/**
* @author Gerd Isenberg based on
* "Using de Bruijn Sequences to Index a 1 in a Computer Word" by
* Charles E. Leiserson
* Harald Prokop
* Keith H. Randall
* @return index 0..63
* @param bb a 64-bit word to bitscan, should not be zero
*/
static private final long deBruijn = 0x03f79d71b4cb0a89L;
static private final int[] magicTable = {
0, 1,48, 2,57,49,28, 3,
61,58,50,42,38,29,17, 4,
62,55,59,36,53,51,43,22,
45,39,33,30,24,18,12, 5,
63,47,56,27,60,41,37,16,
54,35,52,21,44,32,23,11,
46,26,40,15,34,20,31,10,
25,14,19, 9,13, 8, 7, 6,
};
static public int bitScanForwardDeBruijn64 (long b) {
int idx = (int)(((b & -b) * deBruijn) >>> 58);
return magicTable[idx];
}
Code: Select all
public static void main(String[] args) {
int n = 1000000, i;
Date t1 = new Date();
for (i = 0; i < n; i++) {
long set = -1L;
while (set != 0)
set ^= 1L<<bitScanForwardMatt(set);
}
Date t2 = new Date();
for (i = 0; i < n; i++) {
long set = -1L;
while (set != 0)
set ^= 1L<<bitScanForwardDeBruijn64 (set);
}
Date t3 = new Date();
for (i = 0; i < n; i++) {
long set = -1L;
while (set != 0)
set ^= 1L<<bitScanForwardDbl(set);
}
Date t4 = new Date();
for (i = 0; i < n; i++) {
long set = -1L;
while (set != 0)
set ^= 1L<<Long.numberOfTrailingZeros(set);
}
Date t5 = new Date();
double time1 = (t2.getTime() - t1.getTime()) / 64;
double time2 = (t3.getTime() - t2.getTime()) / 64;
double time3 = (t4.getTime() - t3.getTime()) / 64;
double time4 = (t5.getTime() - t4.getTime()) / 64;
System.out.println("bitScanForwardMatt takes " + time1 + " ns");
System.out.println("bitScanForwardDeBruijn64 takes " + time2 + " ns");
System.out.println("bitScanForwardDbl takes " + time3 + " ns");
System.out.println("numberOfTrailingZeros takes " + time4 + " ns");
}
Execution times under various JREs on my AMD X2 4200:
Sun JDK 1.5.0_05 Linux i586 (-server):
Code: Select all
bitScanForwardMatt takes 11.1 ns
bitScanForwardDeBruijn64 takes 15.2 ns
bitScanForwardDbl takes 31.8 ns
numberOfTrailingZeros takes 23.5 ns
Sun JDK 1.5.0_08 Linux AMD64 (-server):
Code: Select all
bitScanForwardMatt takes 9.1 ns
bitScanForwardDeBruijn64 takes 9.4 ns
bitScanForwardDbl takes 13.9 ns
numberOfTrailingZeros takes 13.2 ns
JRockit JDK 1.5.0_6 Linux AMD64:
Code: Select all
bitScanForwardMatt takes 10.9 ns
bitScanForwardDeBruijn64 takes 8.6 ns
bitScanForwardDbl takes 16.6 ns
numberOfTrailingZeros takes 16.6 ns
Sun JDK 1.6.0 beta 2 Linux AMD64 (-server):
Code: Select all
bitScanForwardMatt takes 8.9 ns
bitScanForwardDeBruijn64 takes 9.3 ns
bitScanForwardDbl takes 14.5 ns
numberOfTrailingZeros takes 13.0 ns
Gerd