Bitboards ?: C# vs C++

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Bitboards ?: C# vs C++

Post by hgm »

Just to be explicit: an extraction loop without bsf looks like

Code: Select all

while(todo) {
  uint64_t b = -todo & todo; // isolate lowest 1 bit
  todo -= b; // clear it
  res = table[b*DEBRUIJN>>58]; // look up result (e.g. bit number)
  
}
User avatar
lithander
Posts: 915
Joined: Sun Dec 27, 2020 2:40 am
Location: Bremen, Germany
Full name: Thomas Jahn

Re: Bitboards ?: C# vs C++

Post by lithander »

emadsen wrote: Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that. ;)
Minimal Chess (simple, open source, C#) - Youtube & Github
Leorik (competitive, in active development, C#) - Github & Lichess
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Bitboards ?: C# vs C++

Post by dangi12012 »

lithander wrote: Thu Nov 18, 2021 2:51 pm
emadsen wrote: Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that. ;)
Sherlock - you found a 10 year old api.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
Chessnut1071
Posts: 313
Joined: Tue Aug 03, 2021 2:41 pm
Full name: Bill Beame

Re: Bitboards ?: C# vs C++

Post by Chessnut1071 »

lithander wrote: Thu Nov 18, 2021 2:51 pm
emadsen wrote: Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that. ;)
I'm curious about where there is any time savings using pre computed byte tables vs. pre computed bit boards. In fact, because byte tables can handle 128x more information than a bit, it may be even faster to compute both pseudo and legal moves since more information can be stored in the byte once the move is computed. The bit scan forward is nice, but, you can do the same thing by examining the board with a memory lookup and bail out of the move vector if the square is occupied. You can also simultaneously test if the move is a check upon the enemy king with the same lookup which is no slower than bit bit check mask compare.

I'm programming a bit board alternative to document the speed difference just out of curiosity. I have a test on a 7-move mate puzzle for comparative purposes.
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Bitboards ?: C# vs C++

Post by hgm »

The problem is that current-day hardware doesn't support anything like an instant memory lookup in a 64-byte memory area.

I eagerly await your alternative for bitboard, as I am also of the opinion that conventional bitboard methods do not result in a particularly fast engine. I have in fact been working on such a project myself ('the mailbox trials'), and I am curious how the speed that you can achieve will compare to that.

Solving a mating problem is a meaningless test, however. The time will be critically dependent on the move ordering, rather than the raw speed, and different methods for move generation are unlikely to produce the same move ordering. To make any sensible comparison of performance you would have to compare the nodes per second on a normal search + quiescence search, for some standard evaluation function.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Bitboards ?: C# vs C++

Post by dangi12012 »

Chessnut1071 wrote: Thu Nov 18, 2021 6:38 pm
lithander wrote: Thu Nov 18, 2021 2:51 pm
emadsen wrote: Wed Nov 17, 2021 5:45 pm I just realized System.Numerics.BitOperations class "provides utility methods for intrinsic bit-twiddling operations. The methods use hardware intrinsics when available on the underlying platform; otherwise, they use optimized software fallbacks." See the doc. So no need for preprocessor directives, assuming Microsoft's software fallback code performs well.
I also wasn't aware of that. I guess I should get to work on a bitboard engine... it won't become any more convenient than that. ;)
I'm curious about where there is any time savings using pre computed byte tables vs. pre computed bit boards. In fact, because byte tables can handle 128x more information than a bit, it may be even faster to compute both pseudo and legal moves since more information can be stored in the byte once the move is computed. The bit scan forward is nice, but, you can do the same thing by examining the board with a memory lookup and bail out of the move vector if the square is occupied. You can also simultaneously test if the move is a check upon the enemy king with the same lookup which is no slower than bit bit check mask compare.

I'm programming a bit board alternative to document the speed difference just out of curiosity. I have a test on a 7-move mate puzzle for comparative purposes.
What? Its quite exactly the opposite. A byte table uses 8x more memory at least. So you have char[64] instead of uint64_t. I dont know what you are referring to. When using PEXT on a single bitboard you get any pattern you want and are not limited to 256 element lookups.

If you look for a fast "ischeck" method its simplest to do one queen lookup and one knight lookup from the kings position and AND with the corresponding enemy pieces.
With mailslot you always get if statements inside a loop.
With bitboards you always can get away with a single AND instruction that does the same thing as 8 ifs....

So its 8 ifs vs 1 AND.
Bitboard is superior in any comparison - except its much harder to get right.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Bitboards ?: C# vs C++

Post by dangi12012 »

hgm wrote: Thu Nov 18, 2021 6:55 pm The problem is that current-day hardware doesn't support anything like an instant memory lookup in a 64-byte memory area.

I eagerly await your alternative for bitboard, as I am also of the opinion that conventional bitboard methods do not result in a particularly fast engine. I have in fact been working on such a project myself ('the mailbox trials'), and I am curious how the speed that you can achieve will compare to that.

Solving a mating problem is a meaningless test, however. The time will be critically dependent on the move ordering, rather than the raw speed, and different methods for move generation are unlikely to produce the same move ordering. To make any sensible comparison of performance you would have to compare the nodes per second on a normal search + quiescence search, for some standard evaluation function.
My movegen has a lower limit of 6x faster than qperft. When its an engine it will not be much slower.
In the end that raw speed doesnt even matter because correct move pruning and ordering is much more important for any engine. LC0 gets away with 50k nps - and is not so far worse than SF.

When you do monte carlo simulation move ordering is everything and speed is 2nd.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Bitboards ?: C# vs C++

Post by hgm »

It doesn't matter how fast the move generator is, but how fast the entire engine is. So write an engine (or just a search) using that move generator, then you can compare.

Fast perft is of course useful in itself. But beating qperft is not that much of an achievement. Qperft is a program from 2007, written for a 32-bit machine (with no out-of-order execution: a Pentium-M). How fast would your perft be when you run it on a 32-bit CPU?

If two engines use the same pruning and move ordering, the one with the highest raw speed has the advantage.
Chessnut1071
Posts: 313
Joined: Tue Aug 03, 2021 2:41 pm
Full name: Bill Beame

Re: Bitboards ?: C# vs C++

Post by Chessnut1071 »

hgm wrote: Thu Nov 18, 2021 7:44 pm It doesn't matter how fast the move generator is, but how fast the entire engine is. So write an engine (or just a search) using that move generator, then you can compare.

Fast perft is of course useful in itself. But beating qperft is not that much of an achievement. Qperft is a program from 2007, written for a 32-bit machine (with no out-of-order execution: a Pentium-M). How fast would your perft be when you run it on a 32-bit CPU?

If two engines use the same pruning and move ordering, the one with the highest raw speed has the advantage.
What's wrong with using 2 metrics: 1) number of nodes visited, 2) nodes visited/ time. The first compares the evaluation function and the second compares the engine's speed.
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Bitboards ?: C# vs C++

Post by hgm »

Nothing. But the first doesn't really give you much information. Because there is no meaningful definition for when a search is finished. You can continue it forever, and while you do the number of nodes keeps growing. You can determine the time needed to complete a certain depth, but in the face of reductions and extensions depth is an ill-defined concept as well. You can determine how much time it needs to solve a given problem (like mate, or gaining a piece), but for a given search this would depend enormously on the problem at hand, and what will be very fast in one position will be excedingly slow in another. So you would have to average over many positions, and then make sure these positions are a representative example of what you would encounter in games. And weight them by importance. That boils down to actually playing games, and measuring Elo from the results. Attempts to take shortcuts on this, e.g. by optimizing speed for solving a test suite, almost universally backfire, and the better you are at solving, the lower the Elo usually goes.

Comparing nodes/sec on similar trees (from the same test position) is easy, though. It can be used to directly compare different algorithms for the same tasks (move generation, move sorting, updating the game state, evaluation) speedwise.