Well not completely. You should also consider underpromotions to a rook. A simple case is where the promotion to a queen leads to a stalemate whereas promotion to a rook wins. A more tricky example is the Saavedra position in a KRKP endgame.There's the tacky issue of underpromotions, which are probably completely irrelevant except maybe promotion to knight (with check).
On-demand tablebase generation
Moderator: Ras
- 
				tpetzke
- Posts: 686
- Joined: Thu Mar 03, 2011 4:57 pm
- Location: Germany
Re: On-demand tablebase generation
- 
				syzygy
- Posts: 5774
- Joined: Tue Feb 28, 2012 11:56 pm
Re: On-demand tablebase generation
For the purpose of on-the-fly generation of tables used only to play out the current game, it seems OK to ignore exceptional situations.tpetzke wrote:Well not completely. You should also consider underpromotions to a rook. A simple case is where the promotion to a queen leads to a stalemate whereas promotion to a rook wins. A more tricky example is the Saavedra position in a KRKP endgame.There's the tacky issue of underpromotions, which are probably completely irrelevant except maybe promotion to knight (with check).
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: On-demand tablebase generation
To add some remarks on high-speed EGT generation:
When done properly, this is a task completely dominated by storage access, even when the storage is DRAM. So optimizing the code by clever move generation, using multiple cores etc. all bring you exactly zero speedup. Only making more efficient use of memory access will help.
For this reason it is very beneficial to have a compact data set. One bit per position in general works better than one byte per position, because you can load 8 times as many positions from memory in the same time. So even when you want to do DTM or DTZ rather than just WDL, it is better to store the data as single-bit flags "won-in-N" and "won-in-less-than-N". This drives up total storage, as eventually you will need Nmax single-bit sets, while you could have done with a single set of 2log(Nmax) bits per position. But the crux is that the generation algorithm, to generate won-in-N positions, only needs to know whether a position is already won (i.e. in less than N), and keep track of which positions you add to that for efficiently doing the next iteration. The older bit-sets of "won-in-N" are never needed anymore once you have generated won-in-(N+1) from it. They just sit there, and when DRAM becomes scarce, they can be flushed to disk by sequential writing.
It is also good to realize that EGT generation in general knows two phases with completely different characteristics. There usually is a lengthy (in terms of DTx) initial phase, where only a very small fraction of all positions turn into won-in-N for any particular iteration N. This eventually peters out (for generally drawn end-games), or it suddenly explodes, and in around 10 iterations completely turns the EGT into wins (thus assigning ~10% of all positions the won-in-N state in a single iteration).
To be fast in total, you need to be fast in this initial phase, in particular much faster than it would take to scan through the memory once to find the won/lost-in-N of the previous iteration. The cache-line structure of the CPU forces you to do these accesses in groups of 64 bytes (512 bits), but on most initial iterations the density of lost-in-N positions will be so low that many sets of 512 positions will not have a single lost-in-N in them. So it pays to keep a lost-in-N "directory" next to the lost-in-N bitmap, where you reserve 1 bit for every cache line, to be set whenever a win/loss is assigned to that cache-line. This allows you to quickly skip all those cache lines that contain nothing of interest. The overhead of this is actually small enough to reserve 1 bit per 64 positions; then you can test the emptiness of cache lines in the directory by testing bytes, and when the latter are non-zero, use the set bits in them to actually decide which 64-bit words in the cache line to scan for positions that need treatment (using BSF techniques known from bitboards to quickly extract the set bits).
In the "bulk stage" practically every 32-bit word in every cache line would contain positions that need treatment. So you cannot avoid a complete scan, but that is only part of your problems. The treatment consists of probing or writing to other positions that can be reached by moves from the one under treatment. To avoid having to do cache loads for these accesses you can exploit the fact that data is already present in the cache because of the scan through the bitmap. All possible combinations of pieces from the innermost 3 scanning loops (256K positions, taking up 32KB memory) would fit in L1, so moves with those pieces would only hit already-cached data. If you go beyond that (depending on how large your L2 is) you might have to do extra cache-line fills for each move during treatment. It is then often faster to do two scans through the entire data set with different loop nesting, treating the moves of half the pieces in one scan, and treating those of the other pieces (then in the inner loop) in the next scan.
A great time saver in generating the initial checkmates is to first place all pieces except the black King on the board (which you of course would do incrementally, so it is only one piece that has to be displaced all the time during the scan over positions), then generate all moves of the white pieces to see which squares they attack, and then only place the black King there (for generating the index) to mark them as won wtm positions (won by King capture). Only an attacked square where all neighboring squares (for an orthodox King) are also attacked becomes a potential checkmate (i.e. lost with btm), and would need to be verified by moves of the other black pieces. So you don't generate all positions and then test them for checkmate, but you generate only checkmate candidates. Note that it might not be needed to mark king-capture positions as won wtm, when your move generation for the verification step (where "potentially won" btm positions are trying to find a move to a non-won wtm position) is smart enough to never put its own King in check. It can be faster to directly determine this from the board position, than from doing a probe into the won-in-less-than-N bitmap: the latter might require a memory access, while board and piece list will always be in L1.
			
			
									
						
										
						When done properly, this is a task completely dominated by storage access, even when the storage is DRAM. So optimizing the code by clever move generation, using multiple cores etc. all bring you exactly zero speedup. Only making more efficient use of memory access will help.
For this reason it is very beneficial to have a compact data set. One bit per position in general works better than one byte per position, because you can load 8 times as many positions from memory in the same time. So even when you want to do DTM or DTZ rather than just WDL, it is better to store the data as single-bit flags "won-in-N" and "won-in-less-than-N". This drives up total storage, as eventually you will need Nmax single-bit sets, while you could have done with a single set of 2log(Nmax) bits per position. But the crux is that the generation algorithm, to generate won-in-N positions, only needs to know whether a position is already won (i.e. in less than N), and keep track of which positions you add to that for efficiently doing the next iteration. The older bit-sets of "won-in-N" are never needed anymore once you have generated won-in-(N+1) from it. They just sit there, and when DRAM becomes scarce, they can be flushed to disk by sequential writing.
It is also good to realize that EGT generation in general knows two phases with completely different characteristics. There usually is a lengthy (in terms of DTx) initial phase, where only a very small fraction of all positions turn into won-in-N for any particular iteration N. This eventually peters out (for generally drawn end-games), or it suddenly explodes, and in around 10 iterations completely turns the EGT into wins (thus assigning ~10% of all positions the won-in-N state in a single iteration).
To be fast in total, you need to be fast in this initial phase, in particular much faster than it would take to scan through the memory once to find the won/lost-in-N of the previous iteration. The cache-line structure of the CPU forces you to do these accesses in groups of 64 bytes (512 bits), but on most initial iterations the density of lost-in-N positions will be so low that many sets of 512 positions will not have a single lost-in-N in them. So it pays to keep a lost-in-N "directory" next to the lost-in-N bitmap, where you reserve 1 bit for every cache line, to be set whenever a win/loss is assigned to that cache-line. This allows you to quickly skip all those cache lines that contain nothing of interest. The overhead of this is actually small enough to reserve 1 bit per 64 positions; then you can test the emptiness of cache lines in the directory by testing bytes, and when the latter are non-zero, use the set bits in them to actually decide which 64-bit words in the cache line to scan for positions that need treatment (using BSF techniques known from bitboards to quickly extract the set bits).
In the "bulk stage" practically every 32-bit word in every cache line would contain positions that need treatment. So you cannot avoid a complete scan, but that is only part of your problems. The treatment consists of probing or writing to other positions that can be reached by moves from the one under treatment. To avoid having to do cache loads for these accesses you can exploit the fact that data is already present in the cache because of the scan through the bitmap. All possible combinations of pieces from the innermost 3 scanning loops (256K positions, taking up 32KB memory) would fit in L1, so moves with those pieces would only hit already-cached data. If you go beyond that (depending on how large your L2 is) you might have to do extra cache-line fills for each move during treatment. It is then often faster to do two scans through the entire data set with different loop nesting, treating the moves of half the pieces in one scan, and treating those of the other pieces (then in the inner loop) in the next scan.
A great time saver in generating the initial checkmates is to first place all pieces except the black King on the board (which you of course would do incrementally, so it is only one piece that has to be displaced all the time during the scan over positions), then generate all moves of the white pieces to see which squares they attack, and then only place the black King there (for generating the index) to mark them as won wtm positions (won by King capture). Only an attacked square where all neighboring squares (for an orthodox King) are also attacked becomes a potential checkmate (i.e. lost with btm), and would need to be verified by moves of the other black pieces. So you don't generate all positions and then test them for checkmate, but you generate only checkmate candidates. Note that it might not be needed to mark king-capture positions as won wtm, when your move generation for the verification step (where "potentially won" btm positions are trying to find a move to a non-won wtm position) is smart enough to never put its own King in check. It can be faster to directly determine this from the board position, than from doing a probe into the won-in-less-than-N bitmap: the latter might require a memory access, while board and piece list will always be in L1.
- 
				Evert  
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: On-demand tablebase generation
Yes, but the problem is that most of the time, this will be a waste of time.tpetzke wrote:Well not completely. You should also consider underpromotions to a rook. A simple case is where the promotion to a queen leads to a stalemate whereas promotion to a rook wins. A more tricky example is the Saavedra position in a KRKP endgame.
The difference, I guess, is that promotion to knight more typically gets you out of positions you would otherwise lose (because the knight promotes with check, gaining you a crucial tempo), whereas promotion to rook (or bishop) would typically avoid a draw when you should win. Perhaps that's based on what's usually presented in studies though.
Anyway, the second case we can detect: if we expect, naively, that the current position is won, allowing only queen promotions, but the tablebase finds that it isn't, we can ask what would be different if we allow promotions to a rook instead. For the underpromotion to knight, the question that would need to be asked is: is the position still lost if I promote to knight instead?
Maybe that doesn't work in practice (maybe the whole idea of on-the-fly tablebases doesn't work in practice either), but it's what I'd try anyway. Or, if I have the time and memory anyway, just generate the additional tablebases.
- 
				tpetzke
- Posts: 686
- Joined: Thu Mar 03, 2011 4:57 pm
- Location: Germany
Re: On-demand tablebase generation
I find the whole idea of on-demand tb creation interesting. Currently I have the 3 men table bases build in.
Here I just store a large array of values
But I always wanted one day to replace that with a table base I compute at startup. My old algorithm is to slow for that (and written in Pascal) so re-engineering it will be a bit of effort. It is still on my list.
			
			
									
						
										
						Here I just store a large array of values
I wrote my own table base generator to build that listconst short KPK_TBL[KPK_TBL_SIZE] = {
-4080,
-4080,
-3968,
-3968,
-3825,
-4080,
...

But I always wanted one day to replace that with a table base I compute at startup. My old algorithm is to slow for that (and written in Pascal) so re-engineering it will be a bit of effort. It is still on my list.
- 
				Evert  
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: On-demand tablebase generation
I have re-written my tablebase generator so it is properly using retrograde analysis (first it marks positions that are lost, then marks positions one ply up as "won", then looks at progenitors of those and marks them as "lost" if all successors are also "lost").
The code is fairly general, so it can handle arbitrary pieces and (in principle) an arbitrary number of them. Right now, it does not try to be particularly clever and so it doesn't use things like a bitlist to keep track of which positions need to be updated. The code is significantly faster than my previous attempt, which is nice. However, it is still clearly slower than a few other examples I've tried; it takes a few seconds to generate a 4-men table, which I'd like to speed up more.
However, after spending a few rather frustrating evenings on it, I can't figure out what the problem is. So I'll post what I have here in the hope that someone with more experience with writing tablebase generators has an idea.
To enumerate the positions, I restrict the black king to A1-D1-D4, if the black king is on the diagonal the white king should be below the diagonal. If both kings are on the diagonal a store_tb() call updates both the indicated position and the position mirrored in the diagonal. My tablebase data structure looks like this:
The signature is used to match the material when probing the tablebase, movegen is an array of function pointers of type movegen(int square, bitboard_t occ) which returns the appropriate attack bitboard for that particular piece type. num_pieces is the number of black and white pieces, piece_types is not actually used, size is the size of the allocated aray and data holds the score for each position. The "finalised" flag is set to false whenever a position in the tablebase is updated.
Special scores are
so the score stored is the distance-to-mate (odd for the winning side, even for the losing side).
The index in the table (for 3 men) is calculated as
for 4 men
Positions are mapped to the "standard" representative position (with the kings as described above) using (for 3 pieces):
It uses lookup tables rather than direct computation (square ^= 007;, square ^= 070; etc); I've tried using direct computation instead, but this measured as (slightly) slower.
The outer loop to generate the tablebase is
Generating all possible permutations of pieces and manipulating them is handled by the recurse_loop() function:
The function calls itself recursively to place all pieces on the board, discarding some equivalent positions. If there is work to be done for the current position, it will mark all precursors, either as won (if the current position is lost) or as lost (if all its other descendants are also won).
The mark_precursor() function is:
Finally, the function that tests if all successors are lost:
Not included is the code for the tablebase verification, which tests whether the score stored for a position is consistent with that of its successors. I've also tested the generated KRK and KBNK by comparing random positions with the result of the Nalimov tables (through an on-line interface) and it all seems to work.
Some of the code feels a bit clunky and perhaps I'll think of a better way to do it if I leave it for a bit. For now, I'd be very grateful for any suggestions for where to look for improvements. I've tried gprof, but I didn't find it particularly helpful...
			
			
									
						
										
						The code is fairly general, so it can handle arbitrary pieces and (in principle) an arbitrary number of them. Right now, it does not try to be particularly clever and so it doesn't use things like a bitlist to keep track of which positions need to be updated. The code is significantly faster than my previous attempt, which is nice. However, it is still clearly slower than a few other examples I've tried; it takes a few seconds to generate a 4-men table, which I'd like to speed up more.
However, after spending a few rather frustrating evenings on it, I can't figure out what the problem is. So I'll post what I have here in the hope that someone with more experience with writing tablebase generators has an idea.
To enumerate the positions, I restrict the black king to A1-D1-D4, if the black king is on the diagonal the white king should be below the diagonal. If both kings are on the diagonal a store_tb() call updates both the indicated position and the position mirrored in the diagonal. My tablebase data structure looks like this:
Code: Select all
typedef struct tablebase_t {
   uint64_t signature;
   movegen_func_t movegen[NUM_SIDES][MAX_TB_PIECES];
   int num_pieces[NUM_SIDES];
   int piece_types[NUM_SIDES][MAX_TB_PIECES];
   size_t size;
   int8_t *data;
   bool finalised;
} tablebase_t;
Special scores are
Code: Select all
#define TB_INVAL  127
#define TB_DRAW   120
#define TB_MATE     0
The index in the table (for 3 men) is calculated as
Code: Select all
   int king_index = king_king_index[black_king][white_king];
   index = side * 462*64 + king_index*64 + piece;
Code: Select all
   int king_index = king_king_index[black_king][white_king];
   index = side * 462*64*64 + king_index*64*64 + piece1*64 + piece2;
Code: Select all
static inline void flip3(int *dks, int *aks, int *as)
{
   /* Flip all squares so that the defending king is on ranks 1-4 */
   if (flip_ranks_bb & make_bitboard_square(*dks)) {
      *dks = flip_ranks[*dks];
      *aks = flip_ranks[*aks];
      *as  = flip_ranks[*as];
   }
   /* Flip all squares so that the defending king is on files a-d */
   if (flip_files_bb & make_bitboard_square(*dks)) {
      *dks = flip_files[*dks];
      *aks = flip_files[*aks];
      *as  = flip_files[*as];
   }
   /* Flip all squares so that the defending king is on or below the main diagonal */
   if (flip_diag_bb & make_bitboard_square(*dks)) {
      *dks = flip_diagonal[*dks];
      *aks = flip_diagonal[*aks];
      *as  = flip_diagonal[*as];
   }
   /* Flip all squares so that the attacking king is below the main
    * diagonal if the defending king is on it.
    */
   if ((flip_diag2_bb & make_bitboard_square(*dks)) && (flip_diag3_bb & make_bitboard_square(*aks))) {
      *dks = flip_diagonal[*dks];
      *aks = flip_diagonal[*aks];
      *as  = flip_diagonal[*as];
   }
   assert(make_bitboard_square(*dks) & (~flip_diag_bb) & board_qwing & board_white);
}
The outer loop to generate the tablebase is
Code: Select all
static void initialise_tb3(tablebase_t *tb)
{
   uint64_t time = get_timer();
   /* Initially we mark all positions as "TB_DRAW" */
   memset(tb->data, TB_DRAW, tb->size);
   int pieces[NUM_SIDES][MAX_TB_PIECES];
   int iter = 0;
   tb->finalised = true;
   if (tb->num_pieces[WHITE] > 1) recurse_loop(tb, iter, TASK_INITIALISE, 0, board_empty, BLACK, pieces);
   if (tb->num_pieces[BLACK] > 1) recurse_loop(tb, iter, TASK_INITIALISE, 0, board_empty, WHITE, pieces);
   sides side = BLACK;
   while (!tb->finalised) {
      tb->finalised = true;
      iter++;
      for (int side = WHITE; side <= BLACK; side++) {
         if (tb->num_pieces[side] == 1) continue;
         /* Scan for positions that are "lost" and mark precursors as "won" */
         recurse_loop(tb, iter, TASK_MARK_NEW_WIN, 0, board_empty, next_side[side], pieces);
         /* Scan for positions that have just been marked as "won" and test if
          * their precursors, which are potentially lost, are in fact lost.
          */
         recurse_loop(tb, iter, TASK_MARK_NEW_LOSS, 0, board_empty, side, pieces);
      }
   }
   uint64_t end_time = get_timer();
   printf("Done after %d iterations and %gms\n", iter, (end_time-time)/1.0e3);
   /* Test the consistency of the tablebase */
   verify_tb(tb);
}
Code: Select all
static inline void recurse_loop(tablebase_t *tb,
                         int iter,
                         int task,
                         int current_piece,
                         bitboard_t occ,
                         sides side, int pieces[NUM_SIDES][MAX_TB_PIECES])
{
   /* If there are still pieces to place on the board... */
   if (current_piece < tb->num_pieces[WHITE] + tb->num_pieces[BLACK]) {
      /* place the next one. */
      if (current_piece == 0) {  /* Black king */
         for (int idks = 0; idks < sizeof dks_squares / sizeof *dks_squares; idks++) {
            int s = pieces[BLACK][0] = dks_squares[idks];
            recurse_loop(tb, iter, task, current_piece+1, occ | make_bitboard_square(s), side, pieces);
         }
         return;
      }
      if (current_piece == 1) {  /* White king */
         for (int aks = 0; aks < 64; aks++) {
            int s = pieces[WHITE][0] = aks;
            /* Skip positions that are redundant under symmetry
             * transformations.
             */
            if (king_king_index[pieces[BLACK][0]][pieces[WHITE][0]] < 0) continue;
            if ((flip_diag2_bb&make_bitboard_square(pieces[BLACK][0])) &&
                (flip_diag3_bb&make_bitboard_square(pieces[WHITE][0])))
               continue;
            recurse_loop(tb, iter, task, current_piece+1, occ | make_bitboard_square(s), side, pieces);
         }
         return;
      }
      int p = current_piece - 2;
      if (p <= tb->num_pieces[WHITE]-1 && tb->num_pieces[WHITE] > 1) {
         for (int as=0; as < 64; as++) {
            /* Skip positions where pieces overlap (mark them as illegal) */
            if (occ & make_bitboard_square(as)) continue;
            pieces[WHITE][p+1] = as;
            recurse_loop(tb, iter, task, current_piece+1, occ | make_bitboard_square(as), side, pieces);
         }
         return;
      } else {
         assert(tb->num_pieces[BLACK] > 1);
         p -= tb->num_pieces[WHITE]-1;
         if (p <= tb->num_pieces[BLACK]-1 && tb->num_pieces[BLACK] > 1) {
            /* TODO */
            assert(false);
         }
      }
   }
   /* Test whether the current position is mate */
   if (task == TASK_INITIALISE) {
      assert(iter == 0);
      sides other = next_side[side];
      /* Discard the position of the defending king when generating
       * attacks in order to include x-ray attacks.
       */
      bitboard_t kocc  = occ ^ make_bitboard_square(pieces[side][0]);
      bitboard_t king  = make_bitboard_square(pieces[side][0]);
      bitboard_t dk    = king | king_attack[pieces[side][0]];
      bitboard_t taboo = king_attack[pieces[other][0]];
      for (int n = 1; n<tb->num_pieces[other]; n++) taboo |= tb->movegen[other][n](pieces[other][n], kocc);
      if ((dk & taboo) == dk)
         store_tb(tb, pieces, side, TB_MATE);
      return;
   }
   int score = probe_tb(tb, pieces, side);
   if ((score < iter-1) || score > iter || score == TB_DRAW) return;
   /* Look for positions that have just been marked as "lost" and mark
    * precursors as "won".
    */
   if (task == TASK_MARK_NEW_WIN) {
      if ((score & 1)==0) {
         sides other = next_side[side];
         mark_precursor(tb, false, other, pieces, score+1);
      }
      return;
   }
   /* Look for positions that have just been marked as "won" and test if
    * precursors are "lost".
    */
   if (task == TASK_MARK_NEW_LOSS) {
      if ((score & 1)==1) {
         sides other = next_side[side];
         mark_precursor(tb, true, other, pieces, score+1);
         return;
      }
   }
}
The mark_precursor() function is:
Code: Select all
static void inline mark_precursor(tablebase_t *tb,
      bool test_successor,
      sides side, int pieces[NUM_SIDES][MAX_TB_PIECES],
      int score)
{
   sides other = next_side[side];
   bitboard_t patk[tb->num_pieces[side]];
   bitboard_t king = make_bitboard_square(pieces[side][0]);
   bitboard_t incl = board_all;     /* All squares allowed by default */
   /* Generate occupancy bitboards */
   bitboard_t occ = board_empty;
   for (int n = 0; n<tb->num_pieces[WHITE]; n++) occ |= make_bitboard_square(pieces[WHITE][n]);
   for (int n = 0; n<tb->num_pieces[BLACK]; n++) occ |= make_bitboard_square(pieces[BLACK][n]);
   /* Unmake last move */
   int n_check = 0;
   int checking_piece[2];
   /* Skip positions where the king starts out in check, they are
    * illegal in retrograde analysis.
    * We can skip the enemy king in this test, since positions where the
    * kings are next to eachother are excluded by construction.
    */
   for (int n=1; n<tb->num_pieces[other]; n++)
      if (king & tb->movegen[other][n](pieces[other][n], occ)) return;
   /* Loop over all pieces and generate all possible moves for the side
    * to move.
    */
   int ks = pieces[other][0]; /* Enemy king */
   for (int n = 0; n<tb->num_pieces[side]; n++) {
      int as = pieces[side][n];  /* Piece */
      patk[n] = tb->movegen[side][n](as, occ);
      /* Check if this piece is checking the enemy king.
       * If it is, other pieces are only allowed to generate
       * interposing moves. This piece is only allowed to move
       * away.
       */
      if (patk[n] & make_bitboard_square(ks)) {
         patk[n] &= ~tb->movegen[side][n](ks, occ ^ make_bitboard_square(as));
         incl &= connecting_ray[ks][as];
         checking_piece[n_check] = n;
         n_check++;
      }
      /* Captures are not allowed when un-making a move */
      patk[n] &= ~occ;
   }
   /* Limit the allowed moves if the enemy king is in check: we need to
    * un-check the king.
    */
   if (n_check == 1) {
      for (int n = 0; n<tb->num_pieces[side]; n++) {
         if (n == checking_piece[0]) continue;
         patk[n] &= incl;
      }
   } else if (n_check == 2) {
      for (int n = 0; n<tb->num_pieces[side]; n++) {
         if (n == checking_piece[0] || n == checking_piece[1]) continue;
         patk[n] &= incl;
      }
   }
   /* The kings are never allowed to be next to eachother */
   patk[0] &= ~king_attack[pieces[other][0]];
   /* Now unmake all moves in turn and mark the resulting position
    * as "won" - unless it is already won in fewer moves.
    */
   for (int n=0; n<tb->num_pieces[side]; n++) {
      int ps = pieces[side][n];
      while (patk[n]) {
         int square = bitscan64(patk[n]);
         patk[n] ^= make_bitboard_square(square);
         if (square == ks) continue;
         pieces[side][n] = square;
         int prev_score = probe_tb(tb, pieces, side);
         /* Skip positions that have already been found to be lost
          * previously.
          */
         if (prev_score <= score-2 || prev_score == TB_INVAL) continue;
         if (test_successor) {
            assert(prev_score == TB_DRAW || (prev_score ^ 1));
            if (prev_score != TB_DRAW) continue;
            /* The parent position is "won", so test if this position is
             * lost for the current side.
             */
            int new_score;
            if (!successors_are_lost(tb, side, pieces, &new_score))
                  continue;
            new_score++;
            /* This position is lost, but we don't know in how many moves.
             * Check all successor positions.
             */
            assert(new_score == get_best_successor_score(tb, side, pieces)+1);
            assert(score ^ 1);
            assert(new_score ^ 1);
            store_tb(tb, pieces, side, new_score);
         } else {
            /* The parent position is "lost", so this position is "won" */
            assert(score & 1);
            assert(prev_score == TB_DRAW || (prev_score & 1));
            if (prev_score == TB_DRAW || score < prev_score)
               store_tb(tb, pieces, side, score);
         }
      }
      pieces[side][n] = ps;
   }
}
Code: Select all
static inline bool successors_are_lost(const tablebase_t *tb,
      sides side, int pieces[NUM_SIDES][MAX_TB_PIECES], int *ret_score)
{
   /* Generate occupancy bitboards */
   bitboard_t occs[NUM_SIDES] = { board_empty, board_empty };
   for (int n = 0; n<tb->num_pieces[WHITE]; n++) occs[WHITE] |= make_bitboard_square(pieces[WHITE][n]);
   for (int n = 0; n<tb->num_pieces[BLACK]; n++) occs[BLACK] |= make_bitboard_square(pieces[BLACK][n]);
   bitboard_t occ = occs[WHITE] | occs[BLACK];
   /* Assume we are lost unless we can prove otherwise */
   bool lost = true;
   int best_score = 0;
   /* Try all moves.
    * Make sure we do not put the own king in check.
    * FIXME: perhaps we can simply mark positions with the enemy king in
    * check (which are illegal) as lost?
    * The main difficulty is pinned pieces, but this is of course not an
    * issue if the other side has a bare king. In other words, *TODO*
    */
   bitboard_t patk[tb->num_pieces[side]];
   bitboard_t incl = board_all;
   sides other = next_side[side];
   int n_check = 0;
   int checking_piece[2];
   /* Generate all possible moves for the side to move.
    * If this position is illegal (because the other king is in check)
    * retur false.
    * In fact, the retrograde analysis should make sure this does not
    * occur.
    */
   for (int n = 0; n<tb->num_pieces[side]; n++) {
      patk[n] = tb->movegen[side][n](pieces[side][n], occ);
      assert(!(patk[n] & make_bitboard_square(pieces[other][0])));
      patk[n] &= ~occs[side];
   }
   /* Make sure we do not move the king into check. Also make sure we
    * escape from check if we're in it.
    */
   bitboard_t king = make_bitboard_square(pieces[side][0]);
   for (int n = 0; n<tb->num_pieces[other]; n++) {
      bitboard_t batk = tb->movegen[other][n](pieces[other][n], occ ^ king);
      patk[0] &= ~batk;
      if (batk & king) {   /* Check */
         incl &= connecting_ray[pieces[side][0]][pieces[other][n]];
         checking_piece[n_check] = n;
         n_check++;
      }
   }
   /* If we're in check, make sure we only generate evasions. */
   if (n_check) {
      for (int n = 1; n<tb->num_pieces[side]; n++)
         patk[n] &= incl;
   }
   /* TODO: test for pinned pieces */
   /* The kings are never allowed to be next to eachother */
   patk[0] &= ~king_attack[pieces[other][0]];
   /* Now make all moves in turn and test if the resulting position
    * is won (for the other side).
    * If all successors are won (for the other side), then this position
    * is lost.
    */
   for (int n=0; n<tb->num_pieces[side]; n++) {
      int ps = pieces[side][n];
      while (patk[n]) {
         int square = bitscan64(patk[n]);
         patk[n] ^= make_bitboard_square(square);
         pieces[side][n] = square;
         /* Test if the new position is a draw (or unknown), invalid, or
          * even won (can't happen with a bare king, of course).
          * If it is, then this position is not lost and we can return
          * immediately.
          */
         int score = probe_tb(tb, pieces, next_side[side]);
         if (score > best_score && score != TB_DRAW && score != TB_INVAL) best_score = score;
         if (score == TB_DRAW || (score&1)==0) {
            lost = false;
            break;
         }
      }
      pieces[side][n] = ps;
      if (!lost) return false;
   }
   *ret_score = best_score;
   return true;
}
Some of the code feels a bit clunky and perhaps I'll think of a better way to do it if I leave it for a bit. For now, I'd be very grateful for any suggestions for where to look for improvements. I've tried gprof, but I didn't find it particularly helpful...
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: On-demand tablebase generation
I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.
But note this was for a totally symmetryless calculation. I did not exploit the factor 8, which normally would also mean a factor 8 in time. This particular generator was intended to handle asymmetric pieces (Silver, Gold) and non-square boards. I think this is more relevant for on-the-fly building of EGTs anyway. Normally you would be interested in end-games with several Pawns, and, it is only there that the on-the-fly scheme makes sense. Doing 6-men pawnless end-games would be well out of reach even at long TC. Doing 5-men pawnless makes no sense, as under current technology they can easily be pre-computed and stored on disk. The big advantage comes only for pawny end-games, because then you can exploit that you know what pawn structures are still reachable. And in general a fixed Pawn structure breaks the symmetry completely. There will almost always be more than one Pawn, (or you would not have enough limitation of the number of reachable positions to do a 6-men on the fly), and typically you will be able to avoid multiple promotions. So you would only do end-games with Pawns. So we might as well forget about symmetry completely.
The 4-men generator I was using was a quite simple one, (also because it assumed that any single piece + K would not be able to inflict mate, so all conversions are legal draws, and there is no need to calculate or probe successors). It is also not super-fast: it still uses 1 byte per position to store DTx for btm (7 bits) as well as wtm won flags (1 bit). It only does a one-sided calculation, though (won or non-won for white). I think it is always better to calculate white wins and black wins separately. This should both be faster and require less memory.
			
			
									
						
										
						But note this was for a totally symmetryless calculation. I did not exploit the factor 8, which normally would also mean a factor 8 in time. This particular generator was intended to handle asymmetric pieces (Silver, Gold) and non-square boards. I think this is more relevant for on-the-fly building of EGTs anyway. Normally you would be interested in end-games with several Pawns, and, it is only there that the on-the-fly scheme makes sense. Doing 6-men pawnless end-games would be well out of reach even at long TC. Doing 5-men pawnless makes no sense, as under current technology they can easily be pre-computed and stored on disk. The big advantage comes only for pawny end-games, because then you can exploit that you know what pawn structures are still reachable. And in general a fixed Pawn structure breaks the symmetry completely. There will almost always be more than one Pawn, (or you would not have enough limitation of the number of reachable positions to do a 6-men on the fly), and typically you will be able to avoid multiple promotions. So you would only do end-games with Pawns. So we might as well forget about symmetry completely.
The 4-men generator I was using was a quite simple one, (also because it assumed that any single piece + K would not be able to inflict mate, so all conversions are legal draws, and there is no need to calculate or probe successors). It is also not super-fast: it still uses 1 byte per position to store DTx for btm (7 bits) as well as wtm won flags (1 bit). It only does a one-sided calculation, though (won or non-won for white). I think it is always better to calculate white wins and black wins separately. This should both be faster and require less memory.
- 
				Evert  
- Posts: 2929
- Joined: Sat Jan 22, 2011 12:42 am
- Location: NL
Re: On-demand tablebase generation
Mine is 5.3s on a 2.9GHz i7, which definitely seems too slow by comparison.hgm wrote:I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.
I compared the speed with FairyGen as well, and that takes 1.6s for the same task.
All true - part of it is a bit of an exercise as well. I also quite like the appeal of being able to use a tablebase without the need to distribute large files. And when considering arbitrary variants with arbitrary pieces it becomes even more appealing, although then you can also just make it easy for whoever wants them to generate the table bases.But note this was for a totally symmetryless calculation. I did not exploit the factor 8, which normally would also mean a factor 8 in time. This particular generator was intended to handle asymmetric pieces (Silver, Gold) and non-square boards. I think this is more relevant for on-the-fly building of EGTs anyway. Normally you would be interested in end-games with several Pawns, and, it is only there that the on-the-fly scheme makes sense. Doing 6-men pawnless end-games would be well out of reach even at long TC. Doing 5-men pawnless makes no sense, as under current technology they can easily be pre-computed and stored on disk. The big advantage comes only for pawny end-games, because then you can exploit that you know what pawn structures are still reachable. And in general a fixed Pawn structure breaks the symmetry completely. There will almost always be more than one Pawn, (or you would not have enough limitation of the number of reachable positions to do a 6-men on the fly), and typically you will be able to avoid multiple promotions. So you would only do end-games with Pawns. So we might as well forget about symmetry completely.
But putting all that to one side: yes, the main benefit for retrograde analysis is in endings with pawns, and there you can toss the symmetry completely. It only gets you a factor 2 anyway, and you don't need it because the pawn structure you have to deal with is the one that is currently on the board.
Why is that?The 4-men generator I was using was a quite simple one, (also because it assumed that any single piece + K would not be able to inflict mate, so all conversions are legal draws, and there is no need to calculate or probe successors). It is also not super-fast: it still uses 1 byte per position to store DTx for btm (7 bits) as well as wtm won flags (1 bit). It only does a one-sided calculation, though (won or non-won for white). I think it is always better to calculate white wins and black wins separately. This should both be faster and require less memory.
At the moment I don't handle black wins at all, although the code could quite easily, but my thinking was that as I use a byte to store DTM, and DTM is only ever even if you're winning, I can simply process "lost" positions with either side to move blindly. The only caveat (that I haven't dealt with at all yet) is that you need to take care to set the score of a node that has children that are lost as well as children that are won. You want the largest DTM for one and the smallest for the other, so you need to do a different test depending on whether the position is lost or won.
- 
				hgm  
- Posts: 28395
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: On-demand tablebase generation
Indeed, but pieces from arbitrary variants often lack full symmetry. It was quite annoying that FairyGen (which does exploit 8-fold symmetry) could not handle the Spartan Lieutenant and Shogi Gold General. So I also wanted a completely unsymmetric generator that could handle arbitrary board size.Evert wrote:And when considering arbitrary variants with arbitrary pieces it becomes even more appealing, although then you can also just make it easy for whoever wants them to generate the table bases.
For one, you create yourself a memory problem. The fastest generators will be those that use the most compact representation. won/not-won only requires a single bit. Add a third state and that already doubles.Why is that? ...
A second problem is that the access pattern for propagating the wins for the other side is totally different. Speed can be greatly increased by optimizing the indexing scheme for caching. But the opposite wins would need a different indexing scheme. E.g. with the leap-frog algorithm (which is the most efficient one if you can fit its working set in fast storage) you would want to have the black pieces determining the low index bits when calculating white wins, and vice versa.
- 
				mvk
- Posts: 589
- Joined: Tue Jun 04, 2013 10:15 pm
Re: On-demand tablebase generation
I dug up my ancient (1996) 'pretty fast' kbnk and kqkr generators. It turns out they still work:Evert wrote:Mine is 5.3s on a 2.9GHz i7, which definitely seems too slow by comparison.hgm wrote:I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.
I compared the speed with FairyGen as well, and that takes 1.6s for the same task.
KBNK takes 76 ms.
KQKR takes 342 ms.
Single core. Machine+compiler speed has improved by a factor 22000 over the years.
[Account deleted]