Code: Select all
while(todo) {
uint64_t b = -todo & todo; // isolate lowest 1 bit
todo -= b; // clear it
res = table[b*DEBRUIJN>>58]; // look up result (e.g. bit number)
}
Moderator: Ras
Code: Select all
while(todo) {
uint64_t b = -todo & todo; // isolate lowest 1 bit
todo -= b; // clear it
res = table[b*DEBRUIJN>>58]; // look up result (e.g. bit number)
}
I also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that.emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
Sherlock - you found a 10 year old api.lithander wrote: ↑Thu Nov 18, 2021 2:51 pmI also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that.emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.![]()
I'm curious about where there is any time savings using pre computed byte tables vs. pre computed bit boards. In fact, because byte tables can handle 128x more information than a bit, it may be even faster to compute both pseudo and legal moves since more information can be stored in the byte once the move is computed. The bit scan forward is nice, but, you can do the same thing by examining the board with a memory lookup and bail out of the move vector if the square is occupied. You can also simultaneously test if the move is a check upon the enemy king with the same lookup which is no slower than bit bit check mask compare.lithander wrote: ↑Thu Nov 18, 2021 2:51 pmI also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that.emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.![]()
What? Its quite exactly the opposite. A byte table uses 8x more memory at least. So you have char[64] instead of uint64_t. I dont know what you are referring to. When using PEXT on a single bitboard you get any pattern you want and are not limited to 256 element lookups.Chessnut1071 wrote: ↑Thu Nov 18, 2021 6:38 pmI'm curious about where there is any time savings using pre computed byte tables vs. pre computed bit boards. In fact, because byte tables can handle 128x more information than a bit, it may be even faster to compute both pseudo and legal moves since more information can be stored in the byte once the move is computed. The bit scan forward is nice, but, you can do the same thing by examining the board with a memory lookup and bail out of the move vector if the square is occupied. You can also simultaneously test if the move is a check upon the enemy king with the same lookup which is no slower than bit bit check mask compare.lithander wrote: ↑Thu Nov 18, 2021 2:51 pmI also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that.emadsen wrote: ↑Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.![]()
I'm programming a bit board alternative to document the speed difference just out of curiosity. I have a test on a 7-move mate puzzle for comparative purposes.
My movegen has a lower limit of 6x faster than qperft. When its an engine it will not be much slower.hgm wrote: ↑Thu Nov 18, 2021 6:55 pm The problem is that current-day hardware doesn't support anything like an instant memory lookup in a 64-byte memory area.
I eagerly await your alternative for bitboard, as I am also of the opinion that conventional bitboard methods do not result in a particularly fast engine. I have in fact been working on such a project myself ('the mailbox trials'), and I am curious how the speed that you can achieve will compare to that.
Solving a mating problem is a meaningless test, however. The time will be critically dependent on the move ordering, rather than the raw speed, and different methods for move generation are unlikely to produce the same move ordering. To make any sensible comparison of performance you would have to compare the nodes per second on a normal search + quiescence search, for some standard evaluation function.
What's wrong with using 2 metrics: 1) number of nodes visited, 2) nodes visited/ time. The first compares the evaluation function and the second compares the engine's speed.hgm wrote: ↑Thu Nov 18, 2021 7:44 pm It doesn't matter how fast the move generator is, but how fast the entire engine is. So write an engine (or just a search) using that move generator, then you can compare.
Fast perft is of course useful in itself. But beating qperft is not that much of an achievement. Qperft is a program from 2007, written for a 32-bit machine (with no out-of-order execution: a Pentium-M). How fast would your perft be when you run it on a 32-bit CPU?
If two engines use the same pruning and move ordering, the one with the highest raw speed has the advantage.