On-demand tablebase generation

tpetzke · Post by **tpetzke** » Tue Jan 21, 2014 3:29 pm

There's the tacky issue of underpromotions, which are probably completely irrelevant except maybe promotion to knight (with check).

Well not completely. You should also consider underpromotions to a rook. A simple case is where the promotion to a queen leads to a stalemate whereas promotion to a rook wins. A more tricky example is the Saavedra position in a KRKP endgame.

syzygy · Post by **syzygy** » Tue Jan 21, 2014 4:05 pm

tpetzke wrote:
There's the tacky issue of underpromotions, which are probably completely irrelevant except maybe promotion to knight (with check).
Well not completely. You should also consider underpromotions to a rook. A simple case is where the promotion to a queen leads to a stalemate whereas promotion to a rook wins. A more tricky example is the Saavedra position in a KRKP endgame.

For the purpose of on-the-fly generation of tables used only to play out the current game, it seems OK to ignore exceptional situations.

hgm · Post by **hgm** » Tue Jan 21, 2014 5:40 pm

To add some remarks on high-speed EGT generation:

When done properly, this is a task completely dominated by storage access, even when the storage is DRAM. So optimizing the code by clever move generation, using multiple cores etc. all bring you exactly zero speedup. Only making more efficient use of memory access will help.

For this reason it is very beneficial to have a compact data set. One bit per position in general works better than one byte per position, because you can load 8 times as many positions from memory in the same time. So even when you want to do DTM or DTZ rather than just WDL, it is better to store the data as single-bit flags "won-in-N" and "won-in-less-than-N". This drives up total storage, as eventually you will need Nmax single-bit sets, while you could have done with a single set of 2log(Nmax) bits per position. But the crux is that the generation algorithm, to generate won-in-N positions, only needs to know whether a position is already won (i.e. in less than N), and keep track of which positions you add to that for efficiently doing the next iteration. The older bit-sets of "won-in-N" are never needed anymore once you have generated won-in-(N+1) from it. They just sit there, and when DRAM becomes scarce, they can be flushed to disk by sequential writing.

It is also good to realize that EGT generation in general knows two phases with completely different characteristics. There usually is a lengthy (in terms of DTx) initial phase, where only a very small fraction of all positions turn into won-in-N for any particular iteration N. This eventually peters out (for generally drawn end-games), or it suddenly explodes, and in around 10 iterations completely turns the EGT into wins (thus assigning ~10% of all positions the won-in-N state in a single iteration).

To be fast in total, you need to be fast in this initial phase, in particular much faster than it would take to scan through the memory once to find the won/lost-in-N of the previous iteration. The cache-line structure of the CPU forces you to do these accesses in groups of 64 bytes (512 bits), but on most initial iterations the density of lost-in-N positions will be so low that many sets of 512 positions will not have a single lost-in-N in them. So it pays to keep a lost-in-N "directory" next to the lost-in-N bitmap, where you reserve 1 bit for every cache line, to be set whenever a win/loss is assigned to that cache-line. This allows you to quickly skip all those cache lines that contain nothing of interest. The overhead of this is actually small enough to reserve 1 bit per 64 positions; then you can test the emptiness of cache lines in the directory by testing bytes, and when the latter are non-zero, use the set bits in them to actually decide which 64-bit words in the cache line to scan for positions that need treatment (using BSF techniques known from bitboards to quickly extract the set bits).

In the "bulk stage" practically every 32-bit word in every cache line would contain positions that need treatment. So you cannot avoid a complete scan, but that is only part of your problems. The treatment consists of probing or writing to other positions that can be reached by moves from the one under treatment. To avoid having to do cache loads for these accesses you can exploit the fact that data is already present in the cache because of the scan through the bitmap. All possible combinations of pieces from the innermost 3 scanning loops (256K positions, taking up 32KB memory) would fit in L1, so moves with those pieces would only hit already-cached data. If you go beyond that (depending on how large your L2 is) you might have to do extra cache-line fills for each move during treatment. It is then often faster to do two scans through the entire data set with different loop nesting, treating the moves of half the pieces in one scan, and treating those of the other pieces (then in the inner loop) in the next scan.

A great time saver in generating the initial checkmates is to first place all pieces except the black King on the board (which you of course would do incrementally, so it is only one piece that has to be displaced all the time during the scan over positions), then generate all moves of the white pieces to see which squares they attack, and then only place the black King there (for generating the index) to mark them as won wtm positions (won by King capture). Only an attacked square where all neighboring squares (for an orthodox King) are also attacked becomes a potential checkmate (i.e. lost with btm), and would need to be verified by moves of the other black pieces. So you don't generate all positions and then test them for checkmate, but you generate only checkmate candidates. Note that it might not be needed to mark king-capture positions as won wtm, when your move generation for the verification step (where "potentially won" btm positions are trying to find a move to a non-won wtm position) is smart enough to never put its own King in check. It can be faster to directly determine this from the board position, than from doing a probe into the won-in-less-than-N bitmap: the latter might require a memory access, while board and piece list will always be in L1.

Evert · Post by **Evert** » Tue Jan 21, 2014 5:43 pm

tpetzke wrote:Well not completely. You should also consider underpromotions to a rook. A simple case is where the promotion to a queen leads to a stalemate whereas promotion to a rook wins. A more tricky example is the Saavedra position in a KRKP endgame.

Yes, but the problem is that most of the time, this will be a waste of time.

The difference, I guess, is that promotion to knight more typically gets you out of positions you would otherwise lose (because the knight promotes with check, gaining you a crucial tempo), whereas promotion to rook (or bishop) would typically avoid a draw when you should win. Perhaps that's based on what's usually presented in studies though.

Anyway, the second case we can detect: if we expect, naively, that the current position is won, allowing only queen promotions, but the tablebase finds that it isn't, we can ask what would be different if we allow promotions to a rook instead. For the underpromotion to knight, the question that would need to be asked is: is the position still lost if I promote to knight instead?

Maybe that doesn't work in practice (maybe the whole idea of on-the-fly tablebases doesn't work in practice either), but it's what I'd try anyway. Or, if I have the time and memory anyway, just generate the additional tablebases.

tpetzke · Post by **tpetzke** » Tue Jan 21, 2014 5:54 pm

I find the whole idea of on-demand tb creation interesting. Currently I have the 3 men table bases build in.

Here I just store a large array of values

const short KPK_TBL[KPK_TBL_SIZE] = {
-4080,
-4080,
-3968,
-3968,
-3825,
-4080,
...

I wrote my own table base generator to build that list

But I always wanted one day to replace that with a table base I compute at startup. My old algorithm is to slow for that (and written in Pascal) so re-engineering it will be a bit of effort. It is still on my list.

Evert · Post by **Evert** » Fri Jan 24, 2014 9:19 pm

I have re-written my tablebase generator so it is properly using retrograde analysis (first it marks positions that are lost, then marks positions one ply up as "won", then looks at progenitors of those and marks them as "lost" if all successors are also "lost").

The code is fairly general, so it can handle arbitrary pieces and (in principle) an arbitrary number of them. Right now, it does not try to be particularly clever and so it doesn't use things like a bitlist to keep track of which positions need to be updated. The code is significantly faster than my previous attempt, which is nice. However, it is still clearly slower than a few other examples I've tried; it takes a few seconds to generate a 4-men table, which I'd like to speed up more.

However, after spending a few rather frustrating evenings on it, I can't figure out what the problem is. So I'll post what I have here in the hope that someone with more experience with writing tablebase generators has an idea.

To enumerate the positions, I restrict the black king to A1-D1-D4, if the black king is on the diagonal the white king should be below the diagonal. If both kings are on the diagonal a store_tb() call updates both the indicated position and the position mirrored in the diagonal. My tablebase data structure looks like this:

Code: Select all

typedef struct tablebase_t {
   uint64_t signature;
   movegen_func_t movegen[NUM_SIDES][MAX_TB_PIECES];
   int num_pieces[NUM_SIDES];
   int piece_types[NUM_SIDES][MAX_TB_PIECES];
   size_t size;
   int8_t *data;
   bool finalised;
} tablebase_t;

The signature is used to match the material when probing the tablebase, movegen is an array of function pointers of type movegen(int square, bitboard_t occ) which returns the appropriate attack bitboard for that particular piece type. num_pieces is the number of black and white pieces, piece_types is not actually used, size is the size of the allocated aray and data holds the score for each position. The "finalised" flag is set to false whenever a position in the tablebase is updated.

Special scores are

Code: Select all

#define TB_INVAL  127
#define TB_DRAW   120
#define TB_MATE     0

so the score stored is the distance-to-mate (odd for the winning side, even for the losing side).
The index in the table (for 3 men) is calculated as

Code: Select all

   int king_index = king_king_index[black_king][white_king];
   index = side * 462*64 + king_index*64 + piece;

for 4 men

Code: Select all

   int king_index = king_king_index[black_king][white_king];
   index = side * 462*64*64 + king_index*64*64 + piece1*64 + piece2;

Positions are mapped to the "standard" representative position (with the kings as described above) using (for 3 pieces):

Code: Select all

static inline void flip3(int *dks, int *aks, int *as)
{
   /* Flip all squares so that the defending king is on ranks 1-4 */
   if (flip_ranks_bb & make_bitboard_square(*dks)) {
      *dks = flip_ranks[*dks];
      *aks = flip_ranks[*aks];
      *as  = flip_ranks[*as];
   }
   /* Flip all squares so that the defending king is on files a-d */
   if (flip_files_bb & make_bitboard_square(*dks)) {
      *dks = flip_files[*dks];
      *aks = flip_files[*aks];
      *as  = flip_files[*as];
   }
   /* Flip all squares so that the defending king is on or below the main diagonal */
   if (flip_diag_bb & make_bitboard_square(*dks)) {
      *dks = flip_diagonal[*dks];
      *aks = flip_diagonal[*aks];
      *as  = flip_diagonal[*as];
   }
   /* Flip all squares so that the attacking king is below the main
    * diagonal if the defending king is on it.
    */
   if ((flip_diag2_bb & make_bitboard_square(*dks)) && (flip_diag3_bb & make_bitboard_square(*aks))) {
      *dks = flip_diagonal[*dks];
      *aks = flip_diagonal[*aks];
      *as  = flip_diagonal[*as];
   }
   assert(make_bitboard_square(*dks) & (~flip_diag_bb) & board_qwing & board_white);
}

It uses lookup tables rather than direct computation (square ^= 007;, square ^= 070; etc); I've tried using direct computation instead, but this measured as (slightly) slower.

The outer loop to generate the tablebase is

Code: Select all

static void initialise_tb3(tablebase_t *tb)
{
   uint64_t time = get_timer();

   /* Initially we mark all positions as "TB_DRAW" */
   memset(tb->data, TB_DRAW, tb->size);

   int pieces[NUM_SIDES][MAX_TB_PIECES];

   int iter = 0;
   tb->finalised = true;
   if (tb->num_pieces[WHITE] > 1) recurse_loop(tb, iter, TASK_INITIALISE, 0, board_empty, BLACK, pieces);
   if (tb->num_pieces[BLACK] > 1) recurse_loop(tb, iter, TASK_INITIALISE, 0, board_empty, WHITE, pieces);

   sides side = BLACK;
   while (!tb->finalised) {
      tb->finalised = true;
      iter++;

      for (int side = WHITE; side <= BLACK; side++) {
         if (tb->num_pieces[side] == 1) continue;
         /* Scan for positions that are "lost" and mark precursors as "won" */
         recurse_loop(tb, iter, TASK_MARK_NEW_WIN, 0, board_empty, next_side[side], pieces);

         /* Scan for positions that have just been marked as "won" and test if
          * their precursors, which are potentially lost, are in fact lost.
          */
         recurse_loop(tb, iter, TASK_MARK_NEW_LOSS, 0, board_empty, side, pieces);
      }
   }

   uint64_t end_time = get_timer();
   printf("Done after %d iterations and %gms\n", iter, (end_time-time)/1.0e3);

   /* Test the consistency of the tablebase */
   verify_tb(tb);
}

Generating all possible permutations of pieces and manipulating them is handled by the recurse_loop() function:

Code: Select all

static inline void recurse_loop(tablebase_t *tb,
                         int iter,
                         int task,
                         int current_piece,
                         bitboard_t occ,
                         sides side, int pieces[NUM_SIDES][MAX_TB_PIECES])
{
   /* If there are still pieces to place on the board... */
   if (current_piece < tb->num_pieces[WHITE] + tb->num_pieces[BLACK]) {
      /* place the next one. */
      if (current_piece == 0) {  /* Black king */
         for (int idks = 0; idks < sizeof dks_squares / sizeof *dks_squares; idks++) {
            int s = pieces[BLACK][0] = dks_squares[idks];
            recurse_loop(tb, iter, task, current_piece+1, occ | make_bitboard_square(s), side, pieces);
         }
         return;
      }
      if (current_piece == 1) {  /* White king */
         for (int aks = 0; aks < 64; aks++) {
            int s = pieces[WHITE][0] = aks;

            /* Skip positions that are redundant under symmetry
             * transformations.
             */
            if (king_king_index[pieces[BLACK][0]][pieces[WHITE][0]] < 0) continue;
            if ((flip_diag2_bb&make_bitboard_square(pieces[BLACK][0])) &&
                (flip_diag3_bb&make_bitboard_square(pieces[WHITE][0])))
               continue;
            recurse_loop(tb, iter, task, current_piece+1, occ | make_bitboard_square(s), side, pieces);
         }
         return;
      }

      int p = current_piece - 2;
      if (p <= tb->num_pieces[WHITE]-1 && tb->num_pieces[WHITE] > 1) {
         for (int as=0; as < 64; as++) {
            /* Skip positions where pieces overlap (mark them as illegal) */
            if (occ & make_bitboard_square(as)) continue;

            pieces[WHITE][p+1] = as;
            recurse_loop(tb, iter, task, current_piece+1, occ | make_bitboard_square(as), side, pieces);
         }
         return;
      } else {
         assert(tb->num_pieces[BLACK] > 1);
         p -= tb->num_pieces[WHITE]-1;
         if (p <= tb->num_pieces[BLACK]-1 && tb->num_pieces[BLACK] > 1) {
            /* TODO */
            assert(false);
         }
      }
   }

   /* Test whether the current position is mate */
   if (task == TASK_INITIALISE) {
      assert(iter == 0);
      sides other = next_side[side];

      /* Discard the position of the defending king when generating
       * attacks in order to include x-ray attacks.
       */
      bitboard_t kocc  = occ ^ make_bitboard_square(pieces[side][0]);
      bitboard_t king  = make_bitboard_square(pieces[side][0]);
      bitboard_t dk    = king | king_attack[pieces[side][0]];
      bitboard_t taboo = king_attack[pieces[other][0]];
      for (int n = 1; n<tb->num_pieces[other]; n++) taboo |= tb->movegen[other][n](pieces[other][n], kocc);

      if ((dk & taboo) == dk)
         store_tb(tb, pieces, side, TB_MATE);
      return;
   }

   int score = probe_tb(tb, pieces, side);
   if ((score < iter-1) || score > iter || score == TB_DRAW) return;

   /* Look for positions that have just been marked as "lost" and mark
    * precursors as "won".
    */
   if (task == TASK_MARK_NEW_WIN) {
      if ((score & 1)==0) {
         sides other = next_side[side];
         mark_precursor(tb, false, other, pieces, score+1);
      }
      return;
   }

   /* Look for positions that have just been marked as "won" and test if
    * precursors are "lost".
    */
   if (task == TASK_MARK_NEW_LOSS) {
      if ((score & 1)==1) {
         sides other = next_side[side];
         mark_precursor(tb, true, other, pieces, score+1);
         return;
      }
   }
}

The function calls itself recursively to place all pieces on the board, discarding some equivalent positions. If there is work to be done for the current position, it will mark all precursors, either as won (if the current position is lost) or as lost (if all its other descendants are also won).

The mark_precursor() function is:

Code: Select all

static void inline mark_precursor(tablebase_t *tb,
      bool test_successor,
      sides side, int pieces[NUM_SIDES][MAX_TB_PIECES],
      int score)
{
   sides other = next_side[side];

   bitboard_t patk[tb->num_pieces[side]];
   bitboard_t king = make_bitboard_square(pieces[side][0]);
   bitboard_t incl = board_all;     /* All squares allowed by default */

   /* Generate occupancy bitboards */
   bitboard_t occ = board_empty;
   for (int n = 0; n<tb->num_pieces[WHITE]; n++) occ |= make_bitboard_square(pieces[WHITE][n]);
   for (int n = 0; n<tb->num_pieces[BLACK]; n++) occ |= make_bitboard_square(pieces[BLACK][n]);

   /* Unmake last move */
   int n_check = 0;
   int checking_piece[2];

   /* Skip positions where the king starts out in check, they are
    * illegal in retrograde analysis.
    * We can skip the enemy king in this test, since positions where the
    * kings are next to eachother are excluded by construction.
    */
   for (int n=1; n<tb->num_pieces[other]; n++)
      if (king & tb->movegen[other][n](pieces[other][n], occ)) return;

   /* Loop over all pieces and generate all possible moves for the side
    * to move.
    */
   int ks = pieces[other][0]; /* Enemy king */
   for (int n = 0; n<tb->num_pieces[side]; n++) {
      int as = pieces[side][n];  /* Piece */
      patk[n] = tb->movegen[side][n](as, occ);

      /* Check if this piece is checking the enemy king.
       * If it is, other pieces are only allowed to generate
       * interposing moves. This piece is only allowed to move
       * away.
       */
      if (patk[n] & make_bitboard_square(ks)) {
         patk[n] &= ~tb->movegen[side][n](ks, occ ^ make_bitboard_square(as));
         incl &= connecting_ray[ks][as];
         checking_piece[n_check] = n;
         n_check++;
      }

      /* Captures are not allowed when un-making a move */
      patk[n] &= ~occ;
   }

   /* Limit the allowed moves if the enemy king is in check: we need to
    * un-check the king.
    */
   if (n_check == 1) {
      for (int n = 0; n<tb->num_pieces[side]; n++) {
         if (n == checking_piece[0]) continue;
         patk[n] &= incl;
      }
   } else if (n_check == 2) {
      for (int n = 0; n<tb->num_pieces[side]; n++) {
         if (n == checking_piece[0] || n == checking_piece[1]) continue;
         patk[n] &= incl;
      }
   }

   /* The kings are never allowed to be next to eachother */
   patk[0] &= ~king_attack[pieces[other][0]];

   /* Now unmake all moves in turn and mark the resulting position
    * as "won" - unless it is already won in fewer moves.
    */
   for (int n=0; n<tb->num_pieces[side]; n++) {
      int ps = pieces[side][n];
      while (patk[n]) {
         int square = bitscan64(patk[n]);
         patk[n] ^= make_bitboard_square(square);
         if (square == ks) continue;
         pieces[side][n] = square;

         int prev_score = probe_tb(tb, pieces, side);

         /* Skip positions that have already been found to be lost
          * previously.
          */
         if (prev_score <= score-2 || prev_score == TB_INVAL) continue;

         if (test_successor) {
            assert(prev_score == TB_DRAW || (prev_score ^ 1));
            if (prev_score != TB_DRAW) continue;

            /* The parent position is "won", so test if this position is
             * lost for the current side.
             */
            int new_score;
            if (!successors_are_lost(tb, side, pieces, &new_score))
                  continue;

            new_score++;

            /* This position is lost, but we don't know in how many moves.
             * Check all successor positions.
             */
            assert(new_score == get_best_successor_score(tb, side, pieces)+1);
            assert(score ^ 1);
            assert(new_score ^ 1);

            store_tb(tb, pieces, side, new_score);
         } else {
            /* The parent position is "lost", so this position is "won" */
            assert(score & 1);
            assert(prev_score == TB_DRAW || (prev_score & 1));

            if (prev_score == TB_DRAW || score < prev_score)
               store_tb(tb, pieces, side, score);
         }
      }
      pieces[side][n] = ps;
   }
}

Finally, the function that tests if all successors are lost:

Code: Select all

static inline bool successors_are_lost(const tablebase_t *tb,
      sides side, int pieces[NUM_SIDES][MAX_TB_PIECES], int *ret_score)
{
   /* Generate occupancy bitboards */
   bitboard_t occs[NUM_SIDES] = { board_empty, board_empty };
   for (int n = 0; n<tb->num_pieces[WHITE]; n++) occs[WHITE] |= make_bitboard_square(pieces[WHITE][n]);
   for (int n = 0; n<tb->num_pieces[BLACK]; n++) occs[BLACK] |= make_bitboard_square(pieces[BLACK][n]);
   bitboard_t occ = occs[WHITE] | occs[BLACK];

   /* Assume we are lost unless we can prove otherwise */
   bool lost = true;
   int best_score = 0;

   /* Try all moves.
    * Make sure we do not put the own king in check.
    * FIXME: perhaps we can simply mark positions with the enemy king in
    * check (which are illegal) as lost?
    * The main difficulty is pinned pieces, but this is of course not an
    * issue if the other side has a bare king. In other words, *TODO*
    */
   bitboard_t patk[tb->num_pieces[side]];
   bitboard_t incl = board_all;
   sides other = next_side[side];
   int n_check = 0;
   int checking_piece[2];

   /* Generate all possible moves for the side to move.
    * If this position is illegal (because the other king is in check)
    * retur false.
    * In fact, the retrograde analysis should make sure this does not
    * occur.
    */
   for (int n = 0; n<tb->num_pieces[side]; n++) {
      patk[n] = tb->movegen[side][n](pieces[side][n], occ);
      assert(!(patk[n] & make_bitboard_square(pieces[other][0])));
      patk[n] &= ~occs[side];
   }

   /* Make sure we do not move the king into check. Also make sure we
    * escape from check if we're in it.
    */
   bitboard_t king = make_bitboard_square(pieces[side][0]);
   for (int n = 0; n<tb->num_pieces[other]; n++) {
      bitboard_t batk = tb->movegen[other][n](pieces[other][n], occ ^ king);
      patk[0] &= ~batk;
      if (batk & king) {   /* Check */
         incl &= connecting_ray[pieces[side][0]][pieces[other][n]];
         checking_piece[n_check] = n;
         n_check++;
      }
   }

   /* If we're in check, make sure we only generate evasions. */
   if (n_check) {
      for (int n = 1; n<tb->num_pieces[side]; n++)
         patk[n] &= incl;
   }

   /* TODO: test for pinned pieces */

   /* The kings are never allowed to be next to eachother */
   patk[0] &= ~king_attack[pieces[other][0]];

   /* Now make all moves in turn and test if the resulting position
    * is won (for the other side).
    * If all successors are won (for the other side), then this position
    * is lost.
    */
   for (int n=0; n<tb->num_pieces[side]; n++) {
      int ps = pieces[side][n];
      while (patk[n]) {
         int square = bitscan64(patk[n]);
         patk[n] ^= make_bitboard_square(square);
         pieces[side][n] = square;

         /* Test if the new position is a draw (or unknown), invalid, or
          * even won (can't happen with a bare king, of course).
          * If it is, then this position is not lost and we can return
          * immediately.
          */
         int score = probe_tb(tb, pieces, next_side[side]);
         if (score > best_score && score != TB_DRAW && score != TB_INVAL) best_score = score;
         if (score == TB_DRAW || (score&1)==0) {
            lost = false;
            break;
         }
      }
      pieces[side][n] = ps;
      if (!lost) return false;
   }

   *ret_score = best_score;

   return true;
}

Not included is the code for the tablebase verification, which tests whether the score stored for a position is consistent with that of its successors. I've also tested the generated KRK and KBNK by comparing random positions with the result of the Nalimov tables (through an on-line interface) and it all seems to work.

Some of the code feels a bit clunky and perhaps I'll think of a better way to do it if I leave it for a bit. For now, I'd be very grateful for any suggestions for where to look for improvements. I've tried gprof, but I didn't find it particularly helpful...

hgm · Post by **hgm** » Fri Jan 24, 2014 11:56 pm

I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.

But note this was for a totally symmetryless calculation. I did not exploit the factor 8, which normally would also mean a factor 8 in time. This particular generator was intended to handle asymmetric pieces (Silver, Gold) and non-square boards. I think this is more relevant for on-the-fly building of EGTs anyway. Normally you would be interested in end-games with several Pawns, and, it is only there that the on-the-fly scheme makes sense. Doing 6-men pawnless end-games would be well out of reach even at long TC. Doing 5-men pawnless makes no sense, as under current technology they can easily be pre-computed and stored on disk. The big advantage comes only for pawny end-games, because then you can exploit that you know what pawn structures are still reachable. And in general a fixed Pawn structure breaks the symmetry completely. There will almost always be more than one Pawn, (or you would not have enough limitation of the number of reachable positions to do a 6-men on the fly), and typically you will be able to avoid multiple promotions. So you would only do end-games with Pawns. So we might as well forget about symmetry completely.

The 4-men generator I was using was a quite simple one, (also because it assumed that any single piece + K would not be able to inflict mate, so all conversions are legal draws, and there is no need to calculate or probe successors). It is also not super-fast: it still uses 1 byte per position to store DTx for btm (7 bits) as well as wtm won flags (1 bit). It only does a one-sided calculation, though (won or non-won for white). I think it is always better to calculate white wins and black wins separately. This should both be faster and require less memory.

Evert · Post by **Evert** » Sat Jan 25, 2014 12:14 am

hgm wrote:I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.

Mine is 5.3s on a 2.9GHz i7, which definitely seems too slow by comparison.
I compared the speed with FairyGen as well, and that takes 1.6s for the same task.

But note this was for a totally symmetryless calculation. I did not exploit the factor 8, which normally would also mean a factor 8 in time. This particular generator was intended to handle asymmetric pieces (Silver, Gold) and non-square boards. I think this is more relevant for on-the-fly building of EGTs anyway. Normally you would be interested in end-games with several Pawns, and, it is only there that the on-the-fly scheme makes sense. Doing 6-men pawnless end-games would be well out of reach even at long TC. Doing 5-men pawnless makes no sense, as under current technology they can easily be pre-computed and stored on disk. The big advantage comes only for pawny end-games, because then you can exploit that you know what pawn structures are still reachable. And in general a fixed Pawn structure breaks the symmetry completely. There will almost always be more than one Pawn, (or you would not have enough limitation of the number of reachable positions to do a 6-men on the fly), and typically you will be able to avoid multiple promotions. So you would only do end-games with Pawns. So we might as well forget about symmetry completely.

All true - part of it is a bit of an exercise as well. I also quite like the appeal of being able to use a tablebase without the need to distribute large files. And when considering arbitrary variants with arbitrary pieces it becomes even more appealing, although then you can also just make it easy for whoever wants them to generate the table bases.

But putting all that to one side: yes, the main benefit for retrograde analysis is in endings with pawns, and there you can toss the symmetry completely. It only gets you a factor 2 anyway, and you don't need it because the pawn structure you have to deal with is the one that is currently on the board.

The 4-men generator I was using was a quite simple one, (also because it assumed that any single piece + K would not be able to inflict mate, so all conversions are legal draws, and there is no need to calculate or probe successors). It is also not super-fast: it still uses 1 byte per position to store DTx for btm (7 bits) as well as wtm won flags (1 bit). It only does a one-sided calculation, though (won or non-won for white). I think it is always better to calculate white wins and black wins separately. This should both be faster and require less memory.

Why is that?
At the moment I don't handle black wins at all, although the code could quite easily, but my thinking was that as I use a byte to store DTM, and DTM is only ever even if you're winning, I can simply process "lost" positions with either side to move blindly. The only caveat (that I haven't dealt with at all yet) is that you need to take care to set the score of a node that has children that are lost as well as children that are won. You want the largest DTM for one and the smallest for the other, so you need to do a different test depending on whether the position is lost or won.

hgm · Post by **hgm** » Sat Jan 25, 2014 12:35 am

Evert wrote:And when considering arbitrary variants with arbitrary pieces it becomes even more appealing, although then you can also just make it easy for whoever wants them to generate the table bases.

Indeed, but pieces from arbitrary variants often lack full symmetry. It was quite annoying that FairyGen (which does exploit 8-fold symmetry) could not handle the Spartan Lieutenant and Shogi Gold General. So I also wanted a completely unsymmetric generator that could handle arbitrary board size.

Why is that? ...

For one, you create yourself a memory problem. The fastest generators will be those that use the most compact representation. won/not-won only requires a single bit. Add a third state and that already doubles.

A second problem is that the access pattern for propagating the wins for the other side is totally different. Speed can be greatly increased by optimizing the indexing scheme for caching. But the opposite wins would need a different indexing scheme. E.g. with the leap-frog algorithm (which is the most efficient one if you can fit its working set in fast storage) you would want to have the black pieces determining the low index bits when calculating white wins, and vice versa.

mvk · Post by **mvk** » Sat Jan 25, 2014 1:16 am

Evert wrote:
hgm wrote:I haven't studied your code yet, but for comparison I transferred my 4-men generator dedicated for finding 3-vs-1 mating potential to my 2.2GHz i3 laptop, to time it there. KBNK took 6.58 sec.
Mine is 5.3s on a 2.9GHz i7, which definitely seems too slow by comparison.
I compared the speed with FairyGen as well, and that takes 1.6s for the same task.

I dug up my ancient (1996) 'pretty fast' kbnk and kqkr generators. It turns out they still work:
KBNK takes 76 ms.
KQKR takes 342 ms.
Single core. Machine+compiler speed has improved by a factor 22000 over the years.

On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation

Re: On-demand tablebase generation