Gaviota EGTBs, interface proposal for programmers

Edmund · Post by **Edmund** » Thu Dec 17, 2009 10:08 pm

What about the following idea in combination with your probe_soft function.

Lets assume you have some array hash[depth] which stores the hash value for every ply you are searching. Usually engines have this implemented anyway for repetition detection.

So when probing you also pass the current hash value and the current depth to the probe_soft function. Then:
in case the value is in the cache: return the value
in case the file is not available: return Invalid
in case the file is available: {
tell the engine to continue its search
probe the egtb
if the hash[depth] still equals the passed hash set a flag to tell the engine to unwind and use the returned value of the positon
}

mcostalba · Post by **mcostalba** » Thu Dec 17, 2009 10:23 pm

IQ wrote: The other scheme with the async i/o is also flawed as it would have to be queued. It would pile up a huge amount of probes in one part of the tree.

I think you cannot do a one-to-one match: you cannot ask for a single position and retrieve a single position from disk because the request frequency is overhelming the retrieve frequency.

We don't have to re-invent the wheel but look at what an OS does when, for instance, reading from a memory mapped exe file.

When fetching the next instruction to execute, if the instruction is not in memory program is blocked and an whole page (4KB in Linux) is read from disk and moved to RAM, then program is restarted and it will execute not only the pending instruction buit also the following ones.

Here the lesson to learn is that if the request frequency is much bigger then the retrieve frequency you need to read from disk _blocks_ of _similar_ positions for any position asked, so that when the data will fill the cache it not only will satisfy another request for the same position but also other requests for similar positions, where similar it means positions that can be reached from the original one in few moves.

In your example, you avoid flooding the requests filtering out requests to the same position's block that is currently "in flight". In a very similar way to how the OS handles the I/O requests.

BTW never said it is trivial to implement

but a good knowledge of OS I/O handling is IMHO very useful because these kind of problems are very classic and have already been discussed, and resolved, many many years ago.

michiguel · Post by **michiguel** » Thu Dec 17, 2009 10:31 pm

IQ wrote:Hi,

i wonder how effective this scheme really is. I would reckon that if hash table probes are also stored in the hash table, the number of "probe_softs" that are actually a tb cache hit to be minute. Do you have any numbers of the probe_soft hit rate?

The difference is huge!!. In the example I posted above you can see this

GTB CACHE STATS
probes: 144794
hard: 5406
soft: 139388
hits: 64661
misses: 1115
softmisses: 79018
efficiency: 98.3%
average searches 786.4
occupancy: 54.4%

there you have 5406 "hard probes", but the number of hits was 64661 (that is soft + hard hits). In other words, the efficiency of total hits increases in such a way that is 12X more than the number of "hard" probes itself!
Half of the soft probes are misses, but that's not big deal because they are relatively cheap.

There are many soft hits that are not in the hash table. Those are positions that were never visited in the search, but are neighbors in the tb blocks, when they are decompressed.

Miguel

The other scheme with the async i/o is also flawed as it would have to be queued. It would pile up a huge amount of probes in one part of the tree.

Therefore i would propose a prioritzed async queue system - the prioritization can easily done via search depth. This should ensure that the important tb probes actually get done in time to be meaningful for search.

What do you guys think?

It deserves experimentation but I think that the return will be little if the soft probes are implemented. Particularly because these queued systems will only work with uncompressed TBs.

Miguel

-IQ
michiguel wrote: Because of this idea is that I started to write my own TBs so I can implement it. However, I found that something simpler could be done. This is what I have already in Gaviota and it works very well (not in the released version yet)

tb_probe_hard ()
{
/* it is the normal blocking probe. it probes the cache, if the info is not there, it loads the cache with the info needed and also returns the info */
}

tb_probe_soft()
{
/* probes the cache, if it is not there, return tb_UNKNOWN */
}

So, when the remaining depth is < 2 I only do tb_probe_soft(). At any other point, I do tb_probe_hard().

I see no decrease in nps when using remaining depth = 2, which indicates that probing is not that bad because we do not probe so often, many positions are in the hashtable, and many are already in cache. If I probe in the last two plies "softly", those probes are for free and the number of probes may increase in an order of magnitude.

michiguel · Post by **michiguel** » Thu Dec 17, 2009 11:04 pm

Edmund wrote:What about the following idea in combination with your probe_soft function.

Lets assume you have some array hash[depth] which stores the hash value for every ply you are searching. Usually engines have this implemented anyway for repetition detection.

So when probing you also pass the current hash value and the current depth to the probe_soft function. Then:
in case the value is in the cache: return the value
in case the file is not available: return Invalid
in case the file is available: {
tell the engine to continue its search
probe the egtb
if the hash[depth] still equals the passed hash set a flag to tell the engine to unwind and use the returned value of the positon
}

If I do probe_soft close the the leaves, realistically, searches will always finish first. On average, a probe to the HD will be around 0.5 to 1 ms, which is, depending on the engine, about ~1000 nodes (or more). That is easily a depth of ~3 close the leaves, I am not mistaken. Anything lower than that, and the probe has no chance to compete. OTOH, close the root, the search has not chance to compete. I estimate that this scheme could be useful when the remaining depth is 3-4 plies (because there is a fair competition between probe and search), and that would be a very narrow window. However, it is worth considering because there it may be the bulk of the hard probes.

Miguel

Edmund · Post by **Edmund** » Fri Dec 18, 2009 12:51 am

michiguel wrote:
Edmund wrote:What about the following idea in combination with your probe_soft function.

Lets assume you have some array hash[depth] which stores the hash value for every ply you are searching. Usually engines have this implemented anyway for repetition detection.

So when probing you also pass the current hash value and the current depth to the probe_soft function. Then:
in case the value is in the cache: return the value
in case the file is not available: return Invalid
in case the file is available: {
tell the engine to continue its search
probe the egtb
if the hash[depth] still equals the passed hash set a flag to tell the engine to unwind and use the returned value of the positon
}
If I do probe_soft close the the leaves, realistically, searches will always finish first. On average, a probe to the HD will be around 0.5 to 1 ms, which is, depending on the engine, about ~1000 nodes (or more). That is easily a depth of ~3 close the leaves, I am not mistaken. Anything lower than that, and the probe has no chance to compete. OTOH, close the root, the search has not chance to compete. I estimate that this scheme could be useful when the remaining depth is 3-4 plies (because there is a fair competition between probe and search), and that would be a very narrow window. However, it is worth considering because there it may be the bulk of the hard probes.

Miguel

First of all, I don't consider remaining depth as a good indicator for the size of the subtree, it very much also depends on the type of node, chance for researches, number of move in the parent node, etc. The advantage of my probe scheme is that it doesn't waste any time at all and the worst thing to happen is that the position (and neighboring positions) end up in the cache.

Concerning your point about the nodes close to the root. I am not in favour of hard probes at all (only when the position on the board itself is already in the egtb). The usual case is that you have a position where one or more capture moves are able to transpose into the egtb. So you could already at the beginning of the node probe_soft all the moves that are stored and in the meantime use the time to search the other moves. This is especially relevant for pv and all nodes, where all moves need a values anyway.

michiguel · Post by **michiguel** » Fri Dec 18, 2009 1:29 am

Edmund wrote:
michiguel wrote:
Edmund wrote:What about the following idea in combination with your probe_soft function.

Lets assume you have some array hash[depth] which stores the hash value for every ply you are searching. Usually engines have this implemented anyway for repetition detection.

So when probing you also pass the current hash value and the current depth to the probe_soft function. Then:
in case the value is in the cache: return the value
in case the file is not available: return Invalid
in case the file is available: {
tell the engine to continue its search
probe the egtb
if the hash[depth] still equals the passed hash set a flag to tell the engine to unwind and use the returned value of the positon
}
If I do probe_soft close the the leaves, realistically, searches will always finish first. On average, a probe to the HD will be around 0.5 to 1 ms, which is, depending on the engine, about ~1000 nodes (or more). That is easily a depth of ~3 close the leaves, I am not mistaken. Anything lower than that, and the probe has no chance to compete. OTOH, close the root, the search has not chance to compete. I estimate that this scheme could be useful when the remaining depth is 3-4 plies (because there is a fair competition between probe and search), and that would be a very narrow window. However, it is worth considering because there it may be the bulk of the hard probes.

Miguel
First of all, I don't consider remaining depth as a good indicator for the size of the subtree, it very much also depends on the type of node, chance for researches, number of move in the parent node, etc. The advantage of my probe scheme is that it doesn't waste any time at all and the worst thing to happen is that the position (and neighboring positions) end up in the cache.

My point is that your suggestion will be useful for nodes that are neither close to the root, nor close to the leaves. I think that is a narrow window in between. If you can predict when to use it (e. g. the subtree is between 300-3000 nodes), I believe it will be good. If you say that depth will be a bad indicator or the size of the subtree, it will make matters more difficult.

Concerning your point about the nodes close to the root. I am not in favour of hard probes at all (only when the position on the board itself is already in the egtb). The usual case is that you have a position where one or more capture moves are able to transpose into the egtb. So you could already at the beginning of the node probe_soft all the moves that are stored and in the meantime use the time to search the other moves. This is especially relevant for pv and all nodes, where all moves need a values anyway.

Close to the root, probing is always faster than searching. You propose here some sort of an ETC applied to EGTBs, trying to get a soft probe rather than a hard probe. That is good, but if the soft probes fail, the cheapest route will be to probe the Hard drive rather than keep searching.

As I said, all this needs experimentation, but we should not forget that all delayed probe schemes need to be done with uncompressed TBs. That is a drawback by itself.

Miguel

Aaron Becker · Post by **Aaron Becker** » Fri Dec 18, 2009 1:32 am

The problem is that this introduces asynchronous communication channels between the tb code and the search code, and requires the search code to be able to roll back a partially completed search. This all adds substantial complexity.

However, if you're searching a tree under a node where the soft probe missed in the cache, then when it finishes fetching the appropriate block, the positions in your subtree should start to hit the cache, and the subtree search should finish very quickly without any rollback. Therefore I think it's fine to just have the soft probe fail and fetch the data in the background, without any notification sent later to the search.

edit: I should add that I don't think it's necessary to ever worry about the performance of probes failing near the root. If a position near the root is in your tb, then realistically you'll hit when you're doing a soft probe after a few levels of iterative deepening anyway.

michiguel · Post by **michiguel** » Fri Dec 18, 2009 1:45 am

Aaron Becker wrote:The problem is that this introduces asynchronous communication channels between the tb code and the search code, and requires the search code to be able to roll back a partially completed search. This all adds substantial complexity.

However, if you're searching a tree under a node where the soft probe missed in the cache, then when it finishes fetching the appropriate block, the positions in your subtree should start to hit the cache, and the subtree search should finish very quickly without any rollback. Therefore I think it's fine to just have the soft probe fail and fetch the data in the background, without any notification sent later to the search.

I agree but in this case, there is no such thing as "free background" because when the data is fetched from the HD it needs to be decompressed, which could be costly. So, this will work if TBs are uncompressed or it is better to dedicate one cpu to this task rather than the search. For instance, I am playing in an octa and there is no much difference between a search with 7 or 8 cores but one core dedicated to decompression might help more. All this needs a very serious tuning. What is the best proportion of nodes dedicated to search and the ones dedicated to decompression? OTOH, this works only with more than one processor. That is ok for competition but may not be for testers. If the testers use on CPU/ engine this scheme will "cheat" them.

Miguel

Aaron Becker · Post by **Aaron Becker** » Fri Dec 18, 2009 2:01 am

It's definitely a sticky problem. My feeling is that you should allow the engine to set the maximum number of threads devoted to tablebase work, and then let the engine worry about whether or not to incur the overhead of tb threads doing decompression in the background. If the decompression process is lengthy, you could also use something like pthread_yield() to ensure that you're not hogging cpu time.

Unfortunately I'm not aware of a way to tell the os that you'd like all of your threads scheduled on a particular number of cores, so the management of that scenario might be touchy for testing conditions. I think in that case you should just trust the engine authors to sort things out, because you can't really control anything from inside the tb code aside from the number of threads you're running. Then the engine can answer the tough questions about how many search threads and how many tb threads to run to give the best performance without going over its resource limits.

Dirt · Post by **Dirt** » Fri Dec 18, 2009 2:44 am

michiguel wrote:PS: On top of all that, I should have a layer of bitbases, but let's leave that for later...

With bitbases it looks to me like you would only probe the full EGTBs at root. What this means for your optimal implementation of tablebase probing should perhaps be considered now.

Much of this discussion looks like it would be more appropriate for future six piece bitbases.

Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers

Re: Gaviota EGTBs, interface proposal for programmers