Transposition table storage for tablebase assistance

sje · Post by **sje** » Fri Jul 17, 2009 9:26 pm

Tablebase probe results are different from regular transposition table probes because: 1) there is no move stored, 2) the score is always exact (therefore, has infinite draft). In recognition of this, Symbolic has a separate transposition table for all tablebase probes that has its own entry format and replacement strategy.

The TB transposition table is shared by all search threads and uses a mutex to insure integrity. The mutex overhead time is insignificant in practice due to the relative low frequency of TB probes.

Symbolic allocates 2^18 entries per color for the TB transposition table at program start up. The table uses a simple replace-always strategy and is never cleared.

bob · Post by **bob** » Sat Jul 18, 2009 2:52 am

sje wrote:Tablebase probe results are different from regular transposition table probes because: 1) there is no move stored, 2) the score is always exact (therefore, has infinite draft). In recognition of this, Symbolic has a separate transposition table for all tablebase probes that has its own entry format and replacement strategy.

The TB transposition table is shared by all search threads and uses a mutex to insure integrity. The mutex overhead time is insignificant in practice due to the relative low frequency of TB probes.

Symbolic allocates 2^18 entries per color for the TB transposition table at program start up. The table uses a simple replace-always strategy and is never cleared.

What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...

sje · Post by **sje** » Sat Jul 18, 2009 3:39 am

bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...

One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.

A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.

bob · Post by **bob** » Sat Jul 18, 2009 8:46 pm

sje wrote:
bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.

A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.

However, it isn't quite "free". You have to do the extra table probe, which has a definite cost. Rather than just a single probe to get a normal hash or EGTB score with one cache line fill.

sje · Post by **sje** » Sat Jul 18, 2009 9:10 pm

bob wrote:
sje wrote:
bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.

A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.
However, it isn't quite "free". You have to do the extra table probe, which has a definite cost. Rather than just a single probe to get a normal hash or EGTB score with one cache line fill.

That would be the case if there were a second probe. But there isn't if there's a hit on the first probe.

And the first probe isn't done at all if there's no chance of a hit, either in the TB transposition table or the external tablebase files. So it works out to only one transposition probe for all TB positions.

Caller (from almost every node):

Code: Select all

  // Tablebase check

  if (!done)
  {
    if ((ply > 0) && pos.IsTablebaseEligible())
    {
      if (leftmost || ((fdepth >= 0) && pos.GetPriorMove().IsGainer()))
      {
        TBCache *tbcptr = DriverTaskPtr->GetTBCachePtr();

        if (tbcptr)
        {
          const Score tbscore = tbcptr->Probe(pos);

          if (!tbscore.IsBroken()) {tbhitcount++; score = tbscore; done = true;};
        };
      };
    };
  };

Probe code:

Code: Select all

Score TBCache::Probe(const Pos& pos)
{
  Score score = BrokenScore;
  Tbk tbk = TbkNil;
  bool direct = true;

  if (TBSigRec::Classify(pos, tbk, direct))
  {
    // Determine side color to assist TB stream selection

    Color side;

    if (IsColorWhite(pos.GetGood()))
      side = (direct ? ColorWhite : ColorBlack);
    else
      side = (direct ? ColorBlack : ColorWhite);

    // Proceed to the more time consuming code if selected TB stream is not broken

    TBCacheRec * const p = &cachevec[tbk][side];

    if (!p->broken)
    {
      // Acquire the TB cache object lock

      Lock();

      // Try the transposition table first

      if (!tbtbssptr || !tbtbssptr->Probe(pos, score))
      {
        // No transposition table score found, now try to ensure an open TB stream

        if (!p->active)
        {
          std::ostringstream oss;

          oss.str(""); oss << "Opening TB file: " << p->ifname << '\n';
          DriverTaskPtr->AddLog(oss.str());
          p->Open();
          if (p->broken)
          {
            oss.str(""); oss << "Broken/missing TB file: " << p->ifname << '\n';
            DriverTaskPtr->AddLog(oss.str());
          };
        };

        // Continue if not a broken TB stream

        if (!p->broken)
        {
          std::ifstream *ifsptr = p->ifsptr;
          TBIndexVec tbindexvec;

          tbindexvec.LoadFromPos(pos, tbk, direct);
          tbindexvec.Normalize(tbk);

          const ui offset = tbindexvec.CalculateStreamOffset(tbk);

          ifsptr->seekg(offset, std::ios::beg);
          if (ifsptr->good())
          {
            char ch;

            ifsptr->get(ch);

            const ByteScore bytescore((si8) ch);

            score = bytescore.MapToScore();
            if (tbtbssptr) tbtbssptr->Stash(pos, score);
          };
        };
      };

      // Release the TB cache object lock

      Unlock();
    };
  };

  return score;
}

bob · Post by **bob** » Sun Jul 19, 2009 7:09 am

sje wrote:
bob wrote:
sje wrote:
bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.

A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.
However, it isn't quite "free". You have to do the extra table probe, which has a definite cost. Rather than just a single probe to get a normal hash or EGTB score with one cache line fill.
That would be the case if there were a second probe. But there isn't if there's a hit on the first probe.

And the first probe isn't done at all if there's no chance of a hit, either in the TB transposition table or the external tablebase files. So it works out to only one transposition probe for all TB positions.

Caller (from almost every node):
Code: Select all
  // Tablebase check

  if (!done)
  {
    if ((ply > 0) && pos.IsTablebaseEligible())
    {
      if (leftmost || ((fdepth >= 0) && pos.GetPriorMove().IsGainer()))
      {
        TBCache *tbcptr = DriverTaskPtr->GetTBCachePtr();

        if (tbcptr)
        {
          const Score tbscore = tbcptr->Probe(pos);

          if (!tbscore.IsBroken()) {tbhitcount++; score = tbscore; done = true;};
        };
      };
    };
  };
Probe code:
Code: Select all
Score TBCache::Probe(const Pos& pos)
{
  Score score = BrokenScore;
  Tbk tbk = TbkNil;
  bool direct = true;

  if (TBSigRec::Classify(pos, tbk, direct))
  {
    // Determine side color to assist TB stream selection

    Color side;

    if (IsColorWhite(pos.GetGood()))
      side = (direct ? ColorWhite : ColorBlack);
    else
      side = (direct ? ColorBlack : ColorWhite);

    // Proceed to the more time consuming code if selected TB stream is not broken

    TBCacheRec * const p = &cachevec[tbk][side];

    if (!p->broken)
    {
      // Acquire the TB cache object lock

      Lock();

      // Try the transposition table first

      if (!tbtbssptr || !tbtbssptr->Probe(pos, score))
      {
        // No transposition table score found, now try to ensure an open TB stream

        if (!p->active)
        {
          std::ostringstream oss;

          oss.str(""); oss << "Opening TB file: " << p->ifname << '\n';
          DriverTaskPtr->AddLog(oss.str());
          p->Open();
          if (p->broken)
          {
            oss.str(""); oss << "Broken/missing TB file: " << p->ifname << '\n';
            DriverTaskPtr->AddLog(oss.str());
          };
        };

        // Continue if not a broken TB stream

        if (!p->broken)
        {
          std::ifstream *ifsptr = p->ifsptr;
          TBIndexVec tbindexvec;

          tbindexvec.LoadFromPos(pos, tbk, direct);
          tbindexvec.Normalize(tbk);

          const ui offset = tbindexvec.CalculateStreamOffset(tbk);

          ifsptr->seekg(offset, std::ios::beg);
          if (ifsptr->good())
          {
            char ch;

            ifsptr->get(ch);

            const ByteScore bytescore((si8) ch);

            score = bytescore.MapToScore();
            if (tbtbssptr) tbtbssptr->Stash(pos, score);
          };
        };
      };

      // Release the TB cache object lock

      Unlock();
    };
  };

  return score;
}

This is a good bit of redundancy however. For example, Eugene has a big LRU cache for egtb probes to avoid doing I/O. I store the EGTB result to avoid even doing the probe into the EGTB cache. And I don't have any special test of any kind, I just probe the hash and if it happens to be an EGTB entry, so much the better as the score is exact anyway and depth is infinite. I still like the idea of one probe. It gives a nice hierarchy, first I probe the hash table which is very fast, then, if the board position is a known egtb position, I do an EGTB probe which is not as fast. After that I do the normal search stuff. No special-case code before the hash probe, and I don't want to do an EGTB probe before a hash probe as the hash probe is much less expensive.

sje · Post by **sje** » Sun Jul 19, 2009 8:01 am

At program initialization time, Symbolic checks the validity of the TB directory and any of the TB files within. Each class file for each color gets an open call. If the file isn't there or permissions are bad or whatever, then a broken status is recorded for that class/color. A successful open is followed by a close. Once the program gets rolling, TB files are opened on an as needed basis and remain open until an I/O error or when the program ends.

Symbolic's position class Pos keeps an incrementally updated piece count and a pair of material signature words, one per color that are also incrementally updated. These handy items are also used in other parts of the program in addition to TB operations, so there's no additional time penalty to be charged against TB support. The piece count is used to quickly disqualify TB probing for positions with too many pieces. There's also a quick check using the signatures to bounce positions where both sides have at least one pawn (no ep support). All of the above is done with quick, inline function calls.

The signature words combined with the side to move are used to probe, via binary search, a table that gives the TB class and direct/reverse status for the position. All of this happens very fast.

If a position survives the preliminary filtering, then the slower stuff starts. The master TB cache object, which includes the TB transposition table, is locked because of the existence of multiple search threads. The TB transposition table is probed and a match gives the score. If the TB transposition probe misses, only then is a file offset computed and a read request is sent to the OS. If the read fails, then the associated stream is closed and the class/color is marked broken. If the read succeeds, the score is stored in the TB transposition table. Finally, the master TB cache object is unlocked and the score (possibly the special BrokenScore) is returned.

With any luck, OpenBSD or Linux will have the needed disk block already in its filesystem data cache.

And with my tablebases, there's no compression so there's no need for decompression.

The only time there will be a probe of the main transposition table for a TB eligible position is when the TB file is missing, broken, or becomes broken. The only time TWO transposition probes occur for the same TB eligible position is the rare case when a TB file breaks during an access, and this can happen only once per class/color.

The design cost? Next to zero as there are already base classes for transposition tables and elements.

bob · Post by **bob** » Sun Jul 19, 2009 7:43 pm

sje wrote:At program initialization time, Symbolic checks the validity of the TB directory and any of the TB files within. Each class file for each color gets an open call. If the file isn't there or permissions are bad or whatever, then a broken status is recorded for that class/color. A successful open is followed by a close. Once the program gets rolling, TB files are opened on an as needed basis and remain open until an I/O error or when the program ends.

Symbolic's position class Pos keeps an incrementally updated piece count and a pair of material signature words, one per color that are also incrementally updated. These handy items are also used in other parts of the program in addition to TB operations, so there's no additional time penalty to be charged against TB support. The piece count is used to quickly disqualify TB probing for positions with too many pieces. There's also a quick check using the signatures to bounce positions where both sides have at least one pawn (no ep support). All of the above is done with quick, inline function calls.

The signature words combined with the side to move are used to probe, via binary search, a table that gives the TB class and direct/reverse status for the position. All of this happens very fast.

If a position survives the preliminary filtering, then the slower stuff starts. The master TB cache object, which includes the TB transposition table, is locked because of the existence of multiple search threads. The TB transposition table is probed and a match gives the score. If the TB transposition probe misses, only then is a file offset computed and a read request is sent to the OS. If the read fails, then the associated stream is closed and the class/color is marked broken. If the read succeeds, the score is stored in the TB transposition table. Finally, the master TB cache object is unlocked and the score (possibly the special BrokenScore) is returned.

With any luck, OpenBSD or Linux will have the needed disk block already in its filesystem data cache.

And with my tablebases, there's no compression so there's no need for decompression.

The only time there will be a probe of the main transposition table for a TB eligible position is when the TB file is missing, broken, or becomes broken. The only time TWO transposition probes occur for the same TB eligible position is the rare case when a TB file breaks during an access, and this can happen only once per class/color.

The design cost? Next to zero as there are already base classes for transposition tables and elements.

When you try on-the-fly decompression as we did in the Nalimov tables, we (Eugene and I) were somewhat surprised that decompression was far faster overall, because you end up doing less file I/O because you are reading in compressed blocks. Less I/O is always good.

I still prefer to follow the "order by increasing cost" approach. Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed. Once I do an EGTB probe on a position, it is unlikely I will repeat it since it promptly goes into the hash table with infinite draft making it very difficult to overwrite.

sje · Post by **sje** » Sun Jul 19, 2009 9:39 pm

bob wrote:When you try on-the-fly decompression as we did in the Nalimov tables, we (Eugene and I) were somewhat surprised that decompression was far faster overall, because you end up doing less file I/O because you are reading in compressed blocks. Less I/O is always good.

The optimality of the compression strategy to be used, or even if compression is to be used at all, is dependent upon the tablebase class, the end user's processor, the memory bandwidth, the disk speed, etc. Therefore, the strategy should be variable and this is the approach I'm taking with the Edwards' Version Two tablebases, possibly to appear early next year.

bob wrote:I still prefer to follow the "order by increasing cost" approach. Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed. Once I do an EGTB probe on a position, it is unlikely I will repeat it since it promptly goes into the hash table with infinite draft making it very difficult to overwrite.

Well, my TB probe approach is also in order of ascending cost where cost equals probability of success times expense of use. The big difference is that the TB values get their own space (half sized entries allow twice as efficient storage). The main transposition table becomes slightly more efficient as it now has more free slots that would be otherwise taken by long-lived TB score entries.

The only downside to Symbolic's current approach is the situation where tablebases are enabled but only a few of the actual TB files are available. This would mean that the TB classifier would be run on TB eligible positions that had no hope of actually being scored, and so a small amount of time would be wasted. This is why the classifier is fairly fast and can be made even faster by using a material hash signature instead of a material piece distribution signature. (Symbolic actually has both of these, but only the distribution signature gets used currently, and it gets used in many places.)

sje · Post by **sje** » Sun Jul 19, 2009 10:04 pm

bob wrote:Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed.

Symbolic:

1) User interrupt check

This looks at a volatile boolean variable that might be set by another thread; if set the search unwinds and exits.

2) General limit check

If the search has a time check in force, then the tick count variable that gives microseconds since program start is tested against the limit value for "try another node". The tick count is maintained by a separate thread activated by a ten millisecond resolution system interval timer, so there's no per node system call overhead. If the search has a node count limit in force, then that is checked.

3) Maximum ply check

Set at 256.

4) Fifty move draw

Very fast.

5) Insufficient material draw

Almost as fast due to material distribution signature.

6) Repetition draw

Need to roll through a (usually) short loop.

7) TB check

As described earlier.

At this point, if nothing has caught then the node processor figures out which of five different sub-processor routines are to be called. These are:

1) the base (ply zero) routine
2) the check evasion routine (uses the main transposition table)
3) the full width routine (uses the main transposition table)
4) the gainers and checks only routine (uses eval/pawn tables)
5) the gainers only routine (uses eval/pawn tables)

Transposition table storage for tablebase assistance

Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance

Re: Transposition table storage for tablebase assistance