I don't understand any of the C++ stuff, but step-by-step generation of slider moves becomes indeed an inefficient method when you are only interested in captures. And unfortunately that is exacty what you are, in the overwhelming majority of the tree nodes.
In the 'mailbox trials' I solved this problem by maintaining a 'neighbor table', which for all occupied squares holds the square numbers of the nearest neigbors. There then never is the need to scan the board to see what a certain move direction will bump into; you can just directly look it up in the table.
I guess you could make a bitboard equivalent of this: tabulate the attack sets (given the current board occupancy) for every occupied square in an array of 64 bitboards. (The elements for the unoccupied squares would be unused.) It would probaby be best to have two such tables:
To get the attacks for a given Rook or Bishop, you could then read those directly from these small tables.
But how to update these tables? Captures (about 95% of all moves in the tree) only evacuate a single square. This affects the attacks boards that were hitting this evacuated square. But since Rook and Bishop moves are symmetric, those are the attacks that are coming from the (occupied) squares that were attacked from the evacuated one.
So when square s gets evacuated, you take orth[s], intersect it with 'occupied' (orth[s] & occupied), and extract the thus hit squares t. For each of those, you add the attacks from s to the old attacks from t (orth[s] | orth[t]). This also adds the moves from s perpendicular to the t-s ray, which would not be reachable from t without turning a corner. To get rid of those, you also keep (completely static) bitboard tables orth0[64] and diag0[64] which contain (i.e. were initialized with) the attacks from each square on a completely empty board. And intersect with those ((orth[s] | orth[t]) & orth0[t]). So:
Code: Select all
Bitboard orth[64], orth0[64], diag[64], diag[64];
void Evacuate(int s)
{
  Bitboard todo = orth[s] & occupied; // all orthogonal neighbors
  while(todo) {
    int t = BSF(todo);
    todo -= 1LL<<t;
    orth[t] = (orth[t] | orth[s]) & orth0[t]; // add discovered ray to neighbor set
  }
  // same for diag
}
With a modest amount of work this would make all Rook and Bishop attacks sets for occupied squares available at all times in small tables (4 x 64 x 8 = 2048 bytes).
Problem is that you would also have to undo it when UnMaking a move. (The curse of incremental update...) One way to do that would be to put the original value of every modified orth[t] and diag[t] in the while loops on a stack (stack[i++] = orth[t]). And for the UnMake then use similar while loops, and copy them back (orth[t] = stack[i++]).