Tablebase probe results are different from regular transposition table probes because: 1) there is no move stored, 2) the score is always exact (therefore, has infinite draft). In recognition of this, Symbolic has a separate transposition table for all tablebase probes that has its own entry format and replacement strategy.
The TB transposition table is shared by all search threads and uses a mutex to insure integrity. The mutex overhead time is insignificant in practice due to the relative low frequency of TB probes.
Symbolic allocates 2^18 entries per color for the TB transposition table at program start up. The table uses a simple replace-always strategy and is never cleared.
Transposition table storage for tablebase assistance
Moderator: Ras
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Transposition table storage for tablebase assistance
What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...sje wrote:Tablebase probe results are different from regular transposition table probes because: 1) there is no move stored, 2) the score is always exact (therefore, has infinite draft). In recognition of this, Symbolic has a separate transposition table for all tablebase probes that has its own entry format and replacement strategy.
The TB transposition table is shared by all search threads and uses a mutex to insure integrity. The mutex overhead time is insignificant in practice due to the relative low frequency of TB probes.
Symbolic allocates 2^18 entries per color for the TB transposition table at program start up. The table uses a simple replace-always strategy and is never cleared.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Transposition table storage for tablebase assistance
One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Transposition table storage for tablebase assistance
However, it isn't quite "free". You have to do the extra table probe, which has a definite cost. Rather than just a single probe to get a normal hash or EGTB score with one cache line fill.sje wrote:One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Transposition table storage for tablebase assistance
That would be the case if there were a second probe. But there isn't if there's a hit on the first probe.bob wrote:However, it isn't quite "free". You have to do the extra table probe, which has a definite cost. Rather than just a single probe to get a normal hash or EGTB score with one cache line fill.sje wrote:One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.
And the first probe isn't done at all if there's no chance of a hit, either in the TB transposition table or the external tablebase files. So it works out to only one transposition probe for all TB positions.
Caller (from almost every node):
Code: Select all
// Tablebase check
if (!done)
{
if ((ply > 0) && pos.IsTablebaseEligible())
{
if (leftmost || ((fdepth >= 0) && pos.GetPriorMove().IsGainer()))
{
TBCache *tbcptr = DriverTaskPtr->GetTBCachePtr();
if (tbcptr)
{
const Score tbscore = tbcptr->Probe(pos);
if (!tbscore.IsBroken()) {tbhitcount++; score = tbscore; done = true;};
};
};
};
};
Code: Select all
Score TBCache::Probe(const Pos& pos)
{
Score score = BrokenScore;
Tbk tbk = TbkNil;
bool direct = true;
if (TBSigRec::Classify(pos, tbk, direct))
{
// Determine side color to assist TB stream selection
Color side;
if (IsColorWhite(pos.GetGood()))
side = (direct ? ColorWhite : ColorBlack);
else
side = (direct ? ColorBlack : ColorWhite);
// Proceed to the more time consuming code if selected TB stream is not broken
TBCacheRec * const p = &cachevec[tbk][side];
if (!p->broken)
{
// Acquire the TB cache object lock
Lock();
// Try the transposition table first
if (!tbtbssptr || !tbtbssptr->Probe(pos, score))
{
// No transposition table score found, now try to ensure an open TB stream
if (!p->active)
{
std::ostringstream oss;
oss.str(""); oss << "Opening TB file: " << p->ifname << '\n';
DriverTaskPtr->AddLog(oss.str());
p->Open();
if (p->broken)
{
oss.str(""); oss << "Broken/missing TB file: " << p->ifname << '\n';
DriverTaskPtr->AddLog(oss.str());
};
};
// Continue if not a broken TB stream
if (!p->broken)
{
std::ifstream *ifsptr = p->ifsptr;
TBIndexVec tbindexvec;
tbindexvec.LoadFromPos(pos, tbk, direct);
tbindexvec.Normalize(tbk);
const ui offset = tbindexvec.CalculateStreamOffset(tbk);
ifsptr->seekg(offset, std::ios::beg);
if (ifsptr->good())
{
char ch;
ifsptr->get(ch);
const ByteScore bytescore((si8) ch);
score = bytescore.MapToScore();
if (tbtbssptr) tbtbssptr->Stash(pos, score);
};
};
};
// Release the TB cache object lock
Unlock();
};
};
return score;
}
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Transposition table storage for tablebase assistance
This is a good bit of redundancy however. For example, Eugene has a big LRU cache for egtb probes to avoid doing I/O. I store the EGTB result to avoid even doing the probe into the EGTB cache. And I don't have any special test of any kind, I just probe the hash and if it happens to be an EGTB entry, so much the better as the score is exact anyway and depth is infinite. I still like the idea of one probe. It gives a nice hierarchy, first I probe the hash table which is very fast, then, if the board position is a known egtb position, I do an EGTB probe which is not as fast. After that I do the normal search stuff. No special-case code before the hash probe, and I don't want to do an EGTB probe before a hash probe as the hash probe is much less expensive.sje wrote:That would be the case if there were a second probe. But there isn't if there's a hit on the first probe.bob wrote:However, it isn't quite "free". You have to do the extra table probe, which has a definite cost. Rather than just a single probe to get a normal hash or EGTB score with one cache line fill.sje wrote:One advantage is that a TB transposition table entry can be much smaller than a main transposition entry. A TB score can fit into a single byte; there's no need to reserve space for a draft, a move, or any flags.bob wrote:What's the advantage over just one table? You have to do two separate probes to two separate blocks of memory, which has cache/TLB issues associated with doing so. I've always just stored EGTB hits in the normal hash with infinite depth...
A 64 bit entry can suffice with the TB score spliced into the bottom bits of the hash signature, so that's half the space of a typical main transposition entry. Therefore, buy one and get the second one free.
And the first probe isn't done at all if there's no chance of a hit, either in the TB transposition table or the external tablebase files. So it works out to only one transposition probe for all TB positions.
Caller (from almost every node):Probe code:Code: Select all
// Tablebase check if (!done) { if ((ply > 0) && pos.IsTablebaseEligible()) { if (leftmost || ((fdepth >= 0) && pos.GetPriorMove().IsGainer())) { TBCache *tbcptr = DriverTaskPtr->GetTBCachePtr(); if (tbcptr) { const Score tbscore = tbcptr->Probe(pos); if (!tbscore.IsBroken()) {tbhitcount++; score = tbscore; done = true;}; }; }; }; };
Code: Select all
Score TBCache::Probe(const Pos& pos) { Score score = BrokenScore; Tbk tbk = TbkNil; bool direct = true; if (TBSigRec::Classify(pos, tbk, direct)) { // Determine side color to assist TB stream selection Color side; if (IsColorWhite(pos.GetGood())) side = (direct ? ColorWhite : ColorBlack); else side = (direct ? ColorBlack : ColorWhite); // Proceed to the more time consuming code if selected TB stream is not broken TBCacheRec * const p = &cachevec[tbk][side]; if (!p->broken) { // Acquire the TB cache object lock Lock(); // Try the transposition table first if (!tbtbssptr || !tbtbssptr->Probe(pos, score)) { // No transposition table score found, now try to ensure an open TB stream if (!p->active) { std::ostringstream oss; oss.str(""); oss << "Opening TB file: " << p->ifname << '\n'; DriverTaskPtr->AddLog(oss.str()); p->Open(); if (p->broken) { oss.str(""); oss << "Broken/missing TB file: " << p->ifname << '\n'; DriverTaskPtr->AddLog(oss.str()); }; }; // Continue if not a broken TB stream if (!p->broken) { std::ifstream *ifsptr = p->ifsptr; TBIndexVec tbindexvec; tbindexvec.LoadFromPos(pos, tbk, direct); tbindexvec.Normalize(tbk); const ui offset = tbindexvec.CalculateStreamOffset(tbk); ifsptr->seekg(offset, std::ios::beg); if (ifsptr->good()) { char ch; ifsptr->get(ch); const ByteScore bytescore((si8) ch); score = bytescore.MapToScore(); if (tbtbssptr) tbtbssptr->Stash(pos, score); }; }; }; // Release the TB cache object lock Unlock(); }; }; return score; }
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Transposition table storage for tablebase assistance
At program initialization time, Symbolic checks the validity of the TB directory and any of the TB files within. Each class file for each color gets an open call. If the file isn't there or permissions are bad or whatever, then a broken status is recorded for that class/color. A successful open is followed by a close. Once the program gets rolling, TB files are opened on an as needed basis and remain open until an I/O error or when the program ends.
Symbolic's position class Pos keeps an incrementally updated piece count and a pair of material signature words, one per color that are also incrementally updated. These handy items are also used in other parts of the program in addition to TB operations, so there's no additional time penalty to be charged against TB support. The piece count is used to quickly disqualify TB probing for positions with too many pieces. There's also a quick check using the signatures to bounce positions where both sides have at least one pawn (no ep support). All of the above is done with quick, inline function calls.
The signature words combined with the side to move are used to probe, via binary search, a table that gives the TB class and direct/reverse status for the position. All of this happens very fast.
If a position survives the preliminary filtering, then the slower stuff starts. The master TB cache object, which includes the TB transposition table, is locked because of the existence of multiple search threads. The TB transposition table is probed and a match gives the score. If the TB transposition probe misses, only then is a file offset computed and a read request is sent to the OS. If the read fails, then the associated stream is closed and the class/color is marked broken. If the read succeeds, the score is stored in the TB transposition table. Finally, the master TB cache object is unlocked and the score (possibly the special BrokenScore) is returned.
With any luck, OpenBSD or Linux will have the needed disk block already in its filesystem data cache.
And with my tablebases, there's no compression so there's no need for decompression.
The only time there will be a probe of the main transposition table for a TB eligible position is when the TB file is missing, broken, or becomes broken. The only time TWO transposition probes occur for the same TB eligible position is the rare case when a TB file breaks during an access, and this can happen only once per class/color.
The design cost? Next to zero as there are already base classes for transposition tables and elements.
Symbolic's position class Pos keeps an incrementally updated piece count and a pair of material signature words, one per color that are also incrementally updated. These handy items are also used in other parts of the program in addition to TB operations, so there's no additional time penalty to be charged against TB support. The piece count is used to quickly disqualify TB probing for positions with too many pieces. There's also a quick check using the signatures to bounce positions where both sides have at least one pawn (no ep support). All of the above is done with quick, inline function calls.
The signature words combined with the side to move are used to probe, via binary search, a table that gives the TB class and direct/reverse status for the position. All of this happens very fast.
If a position survives the preliminary filtering, then the slower stuff starts. The master TB cache object, which includes the TB transposition table, is locked because of the existence of multiple search threads. The TB transposition table is probed and a match gives the score. If the TB transposition probe misses, only then is a file offset computed and a read request is sent to the OS. If the read fails, then the associated stream is closed and the class/color is marked broken. If the read succeeds, the score is stored in the TB transposition table. Finally, the master TB cache object is unlocked and the score (possibly the special BrokenScore) is returned.
With any luck, OpenBSD or Linux will have the needed disk block already in its filesystem data cache.
And with my tablebases, there's no compression so there's no need for decompression.
The only time there will be a probe of the main transposition table for a TB eligible position is when the TB file is missing, broken, or becomes broken. The only time TWO transposition probes occur for the same TB eligible position is the rare case when a TB file breaks during an access, and this can happen only once per class/color.
The design cost? Next to zero as there are already base classes for transposition tables and elements.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Transposition table storage for tablebase assistance
When you try on-the-fly decompression as we did in the Nalimov tables, we (Eugene and I) were somewhat surprised that decompression was far faster overall, because you end up doing less file I/O because you are reading in compressed blocks. Less I/O is always good.sje wrote:At program initialization time, Symbolic checks the validity of the TB directory and any of the TB files within. Each class file for each color gets an open call. If the file isn't there or permissions are bad or whatever, then a broken status is recorded for that class/color. A successful open is followed by a close. Once the program gets rolling, TB files are opened on an as needed basis and remain open until an I/O error or when the program ends.
Symbolic's position class Pos keeps an incrementally updated piece count and a pair of material signature words, one per color that are also incrementally updated. These handy items are also used in other parts of the program in addition to TB operations, so there's no additional time penalty to be charged against TB support. The piece count is used to quickly disqualify TB probing for positions with too many pieces. There's also a quick check using the signatures to bounce positions where both sides have at least one pawn (no ep support). All of the above is done with quick, inline function calls.
The signature words combined with the side to move are used to probe, via binary search, a table that gives the TB class and direct/reverse status for the position. All of this happens very fast.
If a position survives the preliminary filtering, then the slower stuff starts. The master TB cache object, which includes the TB transposition table, is locked because of the existence of multiple search threads. The TB transposition table is probed and a match gives the score. If the TB transposition probe misses, only then is a file offset computed and a read request is sent to the OS. If the read fails, then the associated stream is closed and the class/color is marked broken. If the read succeeds, the score is stored in the TB transposition table. Finally, the master TB cache object is unlocked and the score (possibly the special BrokenScore) is returned.
With any luck, OpenBSD or Linux will have the needed disk block already in its filesystem data cache.
And with my tablebases, there's no compression so there's no need for decompression.
The only time there will be a probe of the main transposition table for a TB eligible position is when the TB file is missing, broken, or becomes broken. The only time TWO transposition probes occur for the same TB eligible position is the rare case when a TB file breaks during an access, and this can happen only once per class/color.
The design cost? Next to zero as there are already base classes for transposition tables and elements.
I still prefer to follow the "order by increasing cost" approach. Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed. Once I do an EGTB probe on a position, it is unlikely I will repeat it since it promptly goes into the hash table with infinite draft making it very difficult to overwrite.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Transposition table storage for tablebase assistance
The optimality of the compression strategy to be used, or even if compression is to be used at all, is dependent upon the tablebase class, the end user's processor, the memory bandwidth, the disk speed, etc. Therefore, the strategy should be variable and this is the approach I'm taking with the Edwards' Version Two tablebases, possibly to appear early next year.bob wrote:When you try on-the-fly decompression as we did in the Nalimov tables, we (Eugene and I) were somewhat surprised that decompression was far faster overall, because you end up doing less file I/O because you are reading in compressed blocks. Less I/O is always good.
Well, my TB probe approach is also in order of ascending cost where cost equals probability of success times expense of use. The big difference is that the TB values get their own space (half sized entries allow twice as efficient storage). The main transposition table becomes slightly more efficient as it now has more free slots that would be otherwise taken by long-lived TB score entries.bob wrote:I still prefer to follow the "order by increasing cost" approach. Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed. Once I do an EGTB probe on a position, it is unlikely I will repeat it since it promptly goes into the hash table with infinite draft making it very difficult to overwrite.
The only downside to Symbolic's current approach is the situation where tablebases are enabled but only a few of the actual TB files are available. This would mean that the TB classifier would be run on TB eligible positions that had no hope of actually being scored, and so a small amount of time would be wasted. This is why the classifier is fairly fast and can be made even faster by using a material hash signature instead of a material piece distribution signature. (Symbolic actually has both of these, but only the distribution signature gets used currently, and it gets used in many places.)
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Transposition table storage for tablebase assistance
Symbolic:bob wrote:Repetition check is the cheapest operation I have, and I do that first. Hash probe is next. EGTBs are significantly more expensive and are deferred until the hash probe has failed.
1) User interrupt check
This looks at a volatile boolean variable that might be set by another thread; if set the search unwinds and exits.
2) General limit check
If the search has a time check in force, then the tick count variable that gives microseconds since program start is tested against the limit value for "try another node". The tick count is maintained by a separate thread activated by a ten millisecond resolution system interval timer, so there's no per node system call overhead. If the search has a node count limit in force, then that is checked.
3) Maximum ply check
Set at 256.
4) Fifty move draw
Very fast.
5) Insufficient material draw
Almost as fast due to material distribution signature.
6) Repetition draw
Need to roll through a (usually) short loop.
7) TB check
As described earlier.
At this point, if nothing has caught then the node processor figures out which of five different sub-processor routines are to be called. These are:
1) the base (ply zero) routine
2) the check evasion routine (uses the main transposition table)
3) the full width routine (uses the main transposition table)
4) the gainers and checks only routine (uses eval/pawn tables)
5) the gainers only routine (uses eval/pawn tables)