bitboards like mailbox

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: bitboards like mailbox

Post by hgm »

I have been thinking of how one would make an efficient check-evasion generator in mailbox. I don't often use a dedicated move generator for that, because in-check nodes are not very abundant, so the impact on overall speed is not very large. But of course all little bits can help.

In bitboard engines limiting the generation to evasions is pretty easy: you just calculate a bitboard mask that indicates all squares on the checking ray, including the checker. Then you do the normal move generation, but before extracting the moves you AND with this mask. (For non-royal pieces; king moves are automatically good evasion candidates, and have to be checked for legality in their own way.)

In mailbox engines this is more tricky. King moves can be generated as always, and would have to be tested for legality in the usual way, as always, e.g. through the IsSquareAttacked() routine described earlier in this discussion. For double checks it would remain at that. Captures of the checker could be generated by a very similar routine, which stores the moves rather than just returning TRUE if it finds one (either in a conventional move list, or by creating an attack-map element just for the checker).

The nasty part occurs with distant (slider) checks, where you have to generate the interpositions. This could be done with the aid of a table that lists all potential interposition squares as a function of king location, checker location, location of the piece we try to interpose, and type of that piece. That would be a huge table: there are 5 basic types (R, B, N and two kind of Pawns), so it would require 64 x 64 x 64 x 5 = 1280k entries. But we can economize on that by indicating the location of the piece relative to king, so that we don't have to use king location as index. Relative locations (i.e. sqr - kingSqr) will require a 240-entry (16 x 15) table rather than 64, however. But the checker has to be on a ray through the king (or it could not be a distant checker), and we could specify its location as one of 8 directions, and a distance 1-7.

For the moment, let us forget about the distance. Then we have a 240 x 8 x 5 table indexed by relative location of piece w.r.t. king, direction of the check ray, and piece type. Each entry would have to indicate where we can interpose, which is potentially on multiple squares. (E.g. a Queen might interpose on 3 different squares of the check ray.) We could of course use a list of 3 square numbers for that (relative to King), but that is a bit bulky. More compact is to associate a bit flag with each square of the check ray, and set the bit to 1 when the piece can be interposed there. On an 8x8 board there can at most be 6 empty squares between checker and king, so this would require only 6 bits. One could call this a 'bit ray', similar in spirit to a bitboard, but encoding a much smaller area. In fact, the 5 bit rays for the different piece types can all be packed into a 32-bit integer (5x6 = 30), and extracted through ANDing with a piece-dependent mask. So we have a 240 x 8 table of 4-byte integers. This is still a bit larger than I would prefer, but we will get to that later. For now assume that during detection of the check we determine its direction, and set a pointer to the corresponding 240-entry table interpose[240] of packed bit rays.

The bit rays are all initialized as if the checker is indeed 7 squares away from the king. In reality it might be closer, meaning that some of the moves to the ray indicated in the bitrays of the interpose[] table are actually not interpositions at all, because they land on the wrong side of the checker. During check detection we can figure out how far away the checker is (e.g. distance[checker - kingSqr]), and use this in a 7-entry table to retrieve a mask 'between' in bitray format that indicates the squares on the ray that are between king and checker. We are then set for generating evasions.

For each non-royal piece in the piece list we would then fetch interpose[location[piece] - kingSqr], and AND it with 'between', and with a rayMask[piece], which picks out the part relevant for the piece under consideration. (A Queen would leave both the Rook and the Bishop bits.) That way we get all squares on the check ray between king and checker that the piece might be able to reach if it is not blocked. We can extract the remaining bits one by one, and look up the corresponding to-square (relative to king) in a 30-entry table for the check direction. (So there would have to be 8 such tables. Or we would need a single table that just encodes the distance to the king, and multiply that with the unit step in the check direction.)

For sliders we would still have to test if the extracted to-square is reachable. This can be done in the standard way: look up the direction in which the to-square lies (w.r.t. the piece), consult the neighbor table to get the nearest obstacle in that direction, look up the direction of the to-square w.r.t. that obstacle. If the directions are different, the obstacle is on the other side of the to-square, and the move is not blocked. leapers would not have to be tested this way, as they could never be blocked.

Code: Select all

// evasion generator
// assumes in-check detection set 'interpose', 'between' and 'checkStep' based on direction and distance of check
// sliders
for(i=3; i<=7; i++) { // loop over sliders
  int fromSqr = location[stm + i]; // square slider is on
  int bitray = interpose[fromSqr - kingSqr]; // full ray maps for 5 piece types
  bitray &= between; // limit to squares between king and checker
  bitray &= rayMask[stm + i]; // limit to moves this slider can make
  while(bitray) {
    int n = bsf(bitray);   // extract next bit
    int dist = nr2dist[n]; // get the distance to king
    int toSqr = kingSqr + dist * checkStep; // calculate square number
    int d = direction[fromSqr - toSqr]; // direction of interposition move
    int obstacle = neighbor[fromSqr][d]; // nearest obstacle in that direction
    if(d != direction[obstacle - toSqr]) // is on other side of toSqr
      EmitMove(fromSqr, toSqr); // so move is valid
    bitray &= bitray - 1; // remove trated bit
  }
  // knights
  for(i=1; i<=2; i++) { // loop over leapers (but not king, who can never interpose)
  int fromSqr = location[stm + i]; // square slider is on
  int bitray = interpose[fromSqr - kingSqr]; // full ray maps for 5 piece types
  bitray &= between; // limit to squares between king and checker
  bitray &= rayMask[stm + i]; // limit to moves this slider can make
  while(bitray) {
    int n = bsf(bitray);   // extract next bit
    int dist = nr2dist[n]; // get the distance to king
    int toSqr = kingSqr + dist * checkStep; // calculate square number
    EmitMove(fromSqr, toSqr); // so move is valid
    bitray &= bitray - 1; // remove trated bit
  }
Pawns are still a bit tricky, because of the double-push. Because bitray tabulation is by location relative to king, it cannot take account of whether the Pawn is on 2nd rank or not. So the best approach would probably be to initialize the bit rays as if a pawn always has a double-push. But that makes the Pawn different from a regular leaper, as the double-push can be blocked, and the Pawn might not have one in the first place. An advantage is that it can be blocked only on a single square, so the test for it is not that complex. The extra code in this case (slipped in just before the Emit() call in the loop over Pawns) could be:

Code: Select all

  if((toSqr - fromSqr & 1) == 0) { // move is double push
    if((fromSqr & 0xF0) != secondRank) continue; // Pawn doesn't have one
    if(board[fromSqr ^ 0x30] != EMPTY) continue; // double-push is blocked
  }
This might be a stupid way to do things, though, as there can be 8 pawns, and at most 6 squares on which to interpose. So it would be better to just do a conventional ray scan over the board along the check ray, and test whether the square in front of it contains a Pawn. Or, when it is empty and on 3rd rank, whether the square before that contains a Pawn.

Code: Select all

int sqr = checker;
while((sqr -= checkStep) != kingSqr) { // loop over check ray
  int fromSqr = sqr - forward; // square below it
  int piece = board[fromSqr]; // look what is there
  if(!piece & (fromSqr & 0xF0) == 0x30) { // square was empty but on 3th rank
    fromSqr -= forward; // one more below it
    piece = board[fromSqr];
    if(piece & 16) // it is a Pawn! (exact test depends on piece numbering)
      Emit(fromSqr, toSqr); // double-push interposes
  } else if(piece & 16) // it is a Pawn! (exact test depends on piece numbering)
    Emit(fromSqr, toSqr); // normal move interposes
}
Somewhere during the game the balance might trip, though, when only very few Pawns are left, and checks will typically come from a larger distance because there are fewer pieces to block them. Then the loop over Pawns would be faster. Especially if there are no Pawns at all.
User avatar
Rebel
Posts: 7388
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: bitboards like mailbox

Post by Rebel »

In my first engine (TRS-80) in BASIC it could do only one ply and if memory serves me well that already took 2-3 minutes from the start position. Because of that QS wasn't an option. So to discover a checkmate I had to solve that in eval, generating king escapes and if none it was a checkmate. Later, moving to Z80 assembler (I believe it was a factor of ~100 faster) I could do 4-6 plies and adding QS (introduced first on the 6502) the checkmate code in eval was no longer needed as after all 90% (or so) of checks are not checkmate and I moved the code to the move generator for the next ply, generating legal moves when in check. It helps, but no miracle.
90% of coding is debugging, the other 10% is writing bugs.
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: bitboards like mailbox

Post by tcusr »

after a bit of tinkering i'm finally convinced that move generation speed doesn't really matter:
1) i removed magic bitboards from my engine and replaced them with this code here . in the terms of NPS for perft it was 20% slower (who cares) and 7% slower in actual search. i'm sure that with other pruning techniques this gap will be even smaller
2) i modified olithink's perft to NOT bulk count on horizon nodes and removed useless stuff for perft in his 'move' function, these are the results:

Code: Select all

Depth: 1 Nodes: 20 in 0.000053 Seconds
Depth: 2 Nodes: 400 in 0.000436 Seconds
Depth: 3 Nodes: 8902 in 0.004865 Seconds
Depth: 4 Nodes: 197281 in 0.094793 Seconds
Depth: 5 Nodes: 4865609 in 2.419352 Seconds
Depth: 6 Nodes: 119060324 in 60.229432 Seconds
3) i found many engines that with a far from optimized move gen (for example i found one that uses a 8x8 board) are still rated 2500-2700
OliverBr
Posts: 865
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: bitboards like mailbox

Post by OliverBr »

tcusr wrote: Thu Dec 02, 2021 10:42 pm 3) i found many engines that with a far from optimized move gen (for example i found one that uses a 8x8 board) are still rated 2500-2700
This is not a surprise. They probably have a very large and good evaluation.
Evaluation is more important to chess strength than speed.

The purpose of OliThink has always been to determine the maximum possible strength of a chess engine with a very small and simple evaluation (only mobility).
OliThink GitHub: https://github.com/olithink
Nice arcticle about OlIThink: https://www.chessengeria.eu/post/olithink-oldie-goldie
Chess Engine OliThink Homepage: http://brausch.org/home/chess
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: bitboards like mailbox

Post by hgm »

The point is that speed doesn't count for much. A 10% spead increase buys you about 10 Elo. Whether an engine is 2200 Elo or 2700 Elo thus isn't speed-related at all.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: bitboards like mailbox

Post by dangi12012 »

hgm wrote: Sun Dec 05, 2021 10:46 pm The point is that speed doesn't count for much. A 10% spead increase buys you about 10 Elo. Whether an engine is 2200 Elo or 2700 Elo thus isn't speed-related at all.
That on its own is true - but 10% speed gives you room to implement better heuristics that you end up with same nps again but with better overall play.
Also ELO has logarithmic behaviour where +50 means a certain win ratio against the weaker player.
https://www.chess.com/forum/view/genera ... -stockfish
So as soon as you have 50% speed advantage its most likely winning (given the same engine)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: bitboards like mailbox

Post by dangi12012 »

hgm wrote: Mon Nov 29, 2021 10:32 am I have been thinking of how one would make an efficient check-evasion generator in mailbox. I don't often use a dedicated move generator for that, because in-check nodes are not very abundant, so the impact on overall speed is not very large. But of course all little bits can help.

In bitboard engines limiting the generation to evasions is pretty easy: you just calculate a bitboard mask that indicates all squares on the checking ray, including the checker. Then you do the normal move generation, but before extracting the moves you AND with this mask. (For non-royal pieces; king moves are automatically good evasion candidates, and have to be checked for legality in their own way.)

In mailbox engines this is more tricky. King moves can be generated as always, and would have to be tested for legality in the usual way, as always, e.g. through the IsSquareAttacked() routine described earlier in this discussion. For double checks it would remain at that. Captures of the checker could be generated by a very similar routine, which stores the moves rather than just returning TRUE if it finds one (either in a conventional move list, or by creating an attack-map element just for the checker).
Yes in bitboards you can set multiple bits at once and you get the perfect result.

Have you ever tried to pack the normal mailslot board into a 4bit structure?
Because then 64x4 = 256 that would fit in an avx2 register and that can do move pruning in 1 instruction again.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
hgm
Posts: 28396
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: bitboards like mailbox

Post by hgm »

I think Gerd Isenberg experimented with such 'nibble boards' once. I am not sure though whether bit-scan-forward exists for 256-bit quantities.
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: bitboards like mailbox

Post by dangi12012 »

hgm wrote: Mon Dec 06, 2021 8:54 am I think Gerd Isenberg experimented with such 'nibble boards' once. I am not sure though whether bit-scan-forward exists for 256-bit quantities.
No it doesnt exist. But you can always union a _mm256 with an array of whatever type fits best.

Also I think even with char[64] you can always alias to 2x __mm256 and ANDNOT with that to remove whole squares (bytes) in two instructions.
Thats infinitely faster than a loop with an branch inside! :)
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
tcusr
Posts: 325
Joined: Tue Aug 31, 2021 10:32 pm
Full name: tcusr

Re: bitboards like mailbox

Post by tcusr »

OliverBr wrote: Sun Dec 05, 2021 9:59 pm
tcusr wrote: Thu Dec 02, 2021 10:42 pm 3) i found many engines that with a far from optimized move gen (for example i found one that uses a 8x8 board) are still rated 2500-2700
This is not a surprise. They probably have a very large and good evaluation.
Evaluation is more important to chess strength than speed.

The purpose of OliThink has always been to determine the maximum possible strength of a chess engine with a very small and simple evaluation (only mobility).
good to know. i really like olithink and knowing that such a strong engine has a 'slow' move generator comforts me