I have one question regarding runtime check of CPU capabilities.
How this can be used for a chess engine that is supposed to be released as a binary and downloaded and used by anyone on any computer ?
I mean, I cannot use a function pointer to redirect at runtime on a fallback standard C code implementation if host CPU does not support POPCNT. That would be far too slow. This kind of functions must be inlined. So, when I check that host CPU has or not has the popcnt capability what can I do ?
Chess engine is not supposed to be compiled on _any_ pc that will run it, but is compiled once with the best optimization and distributed.
The only possibility I foreseen is two create two compiles, one for CPU with POPCNT and another for CPU without POPCNT, but considering that we need also another two versions for 32 and 64 bits we are ramping up fast on this combinatorial escalation.
I agree with you that this would not be useful inside the engine. But you could then release two binaries, one using hardware popcnt and one using the traditional c=0; while (p &= p-1) c++; type loop. Then you could, in the popcnt binary, at least verify that it will work properly as the popcnt instruction probably won't immediately crash your program on a non-popcnt machine, as opposed to just starting a search and going nuts due to no popcnt in hardware.
Much nicer for a program to tell the user he is running a program that needs popcnt, but is not using hardware that supports it, rather than just crashing and burning mysteriously.
bob wrote:
I agree with you that this would not be useful inside the engine. But you could then release two binaries, one using hardware popcnt and one using the traditional c=0; while (p &= p-1) c++; type loop. Then you could, in the popcnt binary, at least verify that it will work properly as the popcnt instruction probably won't immediately crash your program on a non-popcnt machine, as opposed to just starting a search and going nuts due to no popcnt in hardware.
Yes, as a possible compromise between two separates binaries one with popcnt and the other without and your option, a bundle of the two binaries in one package we could think of a runtime check with redirection done at much higher level then the bit count routine.
As example in the same source we could have an evaluation function like this
int evaluate()
{
return has_popcnt() ? evaluate_popcnt() : evaluate_std();
}
In this case the check is done at the evaluate function call level, so its cost is almost zero and below that you have two identical evaluate_xxxx() functions that differ only at the bottom level where one will call popcnt() intrinsic and the other the C version. It works...but is not nice...and personally I don't like to replicate 99% of code of evaluation in two functions....perhaps templetizing the evaluation you don't need to write in two places every time you change something in your evaluation:
Gerd Isenberg wrote:
As a first exercise you may try to find a fast hash function in 32-bit mode simply for some 12 first rank attacks of a center rook.
$ g++ main.cpp -O3 -o mc
$ ./mc 56 10
Finding magic for square 56 with index size 10 vector size is 64...
Longest after 0 iterations is 16
Longest after 34 iterations is 64
Magic is 0x12002020b001
$ ./mc 32 8
Finding magic for square 32 with index size 8 vector size is 160...
Longest after 0 iterations is 160
Magic is 0x4020100800c40401
$