Natural TB

syzygy · Post by **syzygy** » Sat Jun 04, 2016 12:05 am

Rein Halbersma wrote:
syzygy wrote: (What may be more efficient is using a higher probe depth for 6-piece positions than for 5-piece positions, knowing that the 5-piece positions are more likely to be cached in RAM. As it happens, this idea was already implemented in SF. But probing the same table in the leaves when it could have been done just once in a parent node is just silly.)
Pardon my ignorance, but your TBs still use memory-mapped I/O, correct?

Correct.

One avenue for experimentation might be to use conditional lookups after an is-in-RAM check. IIRC, at least on Linux there are OS calls to do this now.

Yes, I have looked into those calls a while ago (I had to search for it, as I did not remember the name of the call). I thought then that it would be too expensive on every call, but still it might be worth a try.

I am also still looking for this instruction:

What would be ideal is an x86-64 load instruction that would not result in a page fault if the required memory page is not in RAM, but that would set an error flag.

Alternatively, it might be useful to have an OS call to page in a block or a series of blocks in the background (basically asynchronously, but without the callback).

Another thing is to use an asynchronous queue of positions-to-be-looked-up. You could then maintain a counter per position and do conditional lookups after a position is queried for a second time etc. But maybe the OS does this more efficiently than such a scheme.

Or use more search threads than physical or logical cores, the idea being that there will always be some waiting for I/O so that others can search on.

At least in draughts in the old days before SSDs and huge RAM, the search would slow down 10x when in the late middle game, e.g. 4 Gb RAM + 100 Gb TBs on disk. When RAM ~ 32Gb and using SSDs, search speed is not affected significantly.

Yes, SSDs and lots of RAM are very nice

And if some of the new memory technologies in development work out, we might at some point have terabytes of TB data accessible with the speed of RAM. And maybe TBs could be generated directly on such memories.

Daniel Shawul · Post by **Daniel Shawul** » Sat Jun 04, 2016 12:06 am

Uri Blass wrote:
syzygy wrote:
mcostalba wrote: Assuming the above it is possible that probing at the end of the search is more efficient because you call probe less times (for instance all null search, razoring, etc don't end up in a TB probe because engine cut-offs earlier)
Is it still so difficult to visualise that probing one time in an interior node and cutting the subtree is more efficient than probing at least once in a leaf below that interior node? (And it is also more accurate.) Probing closer to the leaves certainly does not result in probing less. It is pretty amazing what you are writing here...

It is not the claim based on my understanding.

His claim is that probing one time in an interior node and cutting the subtree may be worse relative to the option to continue the search and maybe not probing later because of pruning and maybe probing later(you do not know it at the time that you are in the interior node).

I will be surprised if it happens but
if the average number of probing after the interior tablebase node is smaller than 1 then at least in theory he may be right.

IIRC knightdreamer bitbases were used entirely like Marco described. Also hand-crafted endgame interior node recognizers, which have information only on the bounds, are used similarly, so the idea is not new. I see no way it could be better than cutting immediately as long as the engine can make progress with bitbase-cutoffs.
Daniel

bob · Post by **bob** » Sat Jun 04, 2016 4:47 am

Uri Blass wrote:
syzygy wrote:
mcostalba wrote: Assuming the above it is possible that probing at the end of the search is more efficient because you call probe less times (for instance all null search, razoring, etc don't end up in a TB probe because engine cut-offs earlier)
Is it still so difficult to visualise that probing one time in an interior node and cutting the subtree is more efficient than probing at least once in a leaf below that interior node? (And it is also more accurate.) Probing closer to the leaves certainly does not result in probing less. It is pretty amazing what you are writing here...

It is not the claim based on my understanding.

His claim is that probing one time in an interior node and cutting the subtree may be worse relative to the option to continue the search and maybe not probing later because of pruning and maybe probing later(you do not know it at the time that you are in the interior node).

I will be surprised if it happens but
if the average number of probing after the interior tablebase node is smaller than 1 then at least in theory he may be right.

How? the probe being discussed terminates the search completely. So how can searching even one more node be "more efficient"? And this is about searching MANY more than one extra node.

bob · Post by **bob** » Sat Jun 04, 2016 4:49 am

hgm wrote:
syzygy wrote:Yes, and then it still makes the "unnatural" move. To avoid that, the engine will somehow have to decide that it does not want the TB win: "My position is so good that there must be a win that does not involve sacking my queen". That should be possible to implement, but of course it will then fail to win won games in certain circumstances.
I don't think it has to be as bad as that. You can still do it in a way that guarantees the win:

In the root, decide which moves preserve the forced win. This can be done by never allowing alpha in the root to rise above the score for a tablebase win. Moves that fail low compared to this threshold are removed from the move list. Then redo the search using only the moves that preserve the win, where you disabled probing of all 'inferior' tables.

The worst thing that can happen then is that it gets very reluctant to win in pathological cases (e.g. where the sacrifice is really the only way to win, because the superfluous piece was trapped anyway). It would then muddle on for a long time without sacrifice, until the 50-move barrier forces it to make the conversion. Arguably even this should count as 'natural play', as it is exactly what a human player would do when he is not bright enough to realize that the better material against normal expectation could not do the job.

It would still make the sacrifice early when the forced conversion was in danger of evaporating (i.e. none of the other moves can be proven to be a win). But in this case it could be seen as a brilliant sacrifice. (E.g. it could be that you sac a Queen, and some moves later gain it back, or get a new one through promotion, and only win the inferior end-game because of that.)

You could also do it the other way around: after you established the root contains a forced win, first search with probing inferior EGTs disabled. Each time a move beats alpha you repeat the search with the probing now enabled. If the move returns a tablebase win, you are done with this iteration. If not, the move is not acceptable (at this depth), and you continue the iteration with the next move. So at ply level 1 any fail low would trigger a verification search with all probing on to prove it fails 'very low' (i.e. a tablebase loss), and if it doesn't, you return beta instead of the fail-low score.

Isn't this right back to the original problem??? IE if just one move guarantees the win by sacrificing the queen...

bob · Post by **bob** » Sat Jun 04, 2016 4:54 am

syzygy wrote:
Rein Halbersma wrote:
syzygy wrote: (What may be more efficient is using a higher probe depth for 6-piece positions than for 5-piece positions, knowing that the 5-piece positions are more likely to be cached in RAM. As it happens, this idea was already implemented in SF. But probing the same table in the leaves when it could have been done just once in a parent node is just silly.)
Pardon my ignorance, but your TBs still use memory-mapped I/O, correct?
Correct.

One avenue for experimentation might be to use conditional lookups after an is-in-RAM check. IIRC, at least on Linux there are OS calls to do this now.
Yes, I have looked into those calls a while ago (I had to search for it, as I did not remember the name of the call). I thought then that it would be too expensive on every call, but still it might be worth a try.

I am also still looking for this instruction:
What would be ideal is an x86-64 load instruction that would not result in a page fault if the required memory page is not in RAM, but that would set an error flag.
Alternatively, it might be useful to have an OS call to page in a block or a series of blocks in the background (basically asynchronously, but without the callback).

Another thing is to use an asynchronous queue of positions-to-be-looked-up. You could then maintain a counter per position and do conditional lookups after a position is queried for a second time etc. But maybe the OS does this more efficiently than such a scheme.
Or use more search threads than physical or logical cores, the idea being that there will always be some waiting for I/O so that others can search on.

At least in draughts in the old days before SSDs and huge RAM, the search would slow down 10x when in the late middle game, e.g. 4 Gb RAM + 100 Gb TBs on disk. When RAM ~ 32Gb and using SSDs, search speed is not affected significantly.
Yes, SSDs and lots of RAM are very nice

And if some of the new memory technologies in development work out, we might at some point have terabytes of TB data accessible with the speed of RAM. And maybe TBs could be generated directly on such memories.

I have not seen such an instruction. The CPU does this internally in the speculative execution code, where it won't speculatively execute an instruction that would cause a page fault (or even a complete L3 cache miss). But I have not seen any non-privileged hardware instruction that does this, and I have looked. But that certainly doesn't mean it doesn't exist, it might be new or it might not yet be documented.

mcostalba · Post by **mcostalba** » Sat Jun 04, 2016 6:05 am

Laskos wrote:First they seriously deteriorate the 5-men play at the root: on randomly occurring in games 5-men wins Natural fails to convert about 5% of them. On 5-men draws, it loses about 0.4% of them. On MES1258 varied endgame testsuite I got a SPRT stop for (-4,1):

BTW did you know that current syzygy when root is in TB completely disables the probing in inner nodes?

So what you have tested is vanilla SF (with DTZ filtering at root) vs natural.

Uri Blass · Post by **Uri Blass** » Sat Jun 04, 2016 6:45 am

bob wrote:
Uri Blass wrote:
syzygy wrote:
mcostalba wrote: Assuming the above it is possible that probing at the end of the search is more efficient because you call probe less times (for instance all null search, razoring, etc don't end up in a TB probe because engine cut-offs earlier)
Is it still so difficult to visualise that probing one time in an interior node and cutting the subtree is more efficient than probing at least once in a leaf below that interior node? (And it is also more accurate.) Probing closer to the leaves certainly does not result in probing less. It is pretty amazing what you are writing here...

It is not the claim based on my understanding.

His claim is that probing one time in an interior node and cutting the subtree may be worse relative to the option to continue the search and maybe not probing later because of pruning and maybe probing later(you do not know it at the time that you are in the interior node).

I will be surprised if it happens but
if the average number of probing after the interior tablebase node is smaller than 1 then at least in theory he may be right.
How? the probe being discussed terminates the search completely. So how can searching even one more node be "more efficient"? And this is about searching MANY more than one extra node.

In theory if searching more nodes is faster than the probe then it can be more efficient.

Uri Blass · Post by **Uri Blass** » Sat Jun 04, 2016 7:05 am

syzygy wrote:
Uri Blass wrote:
syzygy wrote:
mcostalba wrote: Assuming the above it is possible that probing at the end of the search is more efficient because you call probe less times (for instance all null search, razoring, etc don't end up in a TB probe because engine cut-offs earlier)
Is it still so difficult to visualise that probing one time in an interior node and cutting the subtree is more efficient than probing at least once in a leaf below that interior node? (And it is also more accurate.) Probing closer to the leaves certainly does not result in probing less. It is pretty amazing what you are writing here...

It is not the claim based on my understanding.

His claim is that probing one time in an interior node and cutting the subtree may be worse relative to the option to continue the search and maybe not probing later because of pruning and maybe probing later(you do not know it at the time that you are in the interior node).
Yes, but for that to work EVERYTHING would need to be pruned in that subtree before any node is probed. A single probe in the subtree and you haven't saved any probes and lose on searching the subtree and loss of accuracy. Two probes in the subtree and you lose already on the number of probes.

As I wrote near the beginning of this thread, the only sensible thing (unless beta is in mate area) is EITHER probe immediately and cut the whole subtree OR don't probe in the subtree at all.

This is the idea of probedepth: either the search determines that (relatively costly) probing is worth the effort because enough depth remains, or the search determines that the subtree is too small to replace searching it with a probe and then obviously the same applies to all the subtrees within that subtree. (As stated a few posts above, one refinement is that it may make sense to use lower probedepth for 5-piece positions than for 6-piece positions. This is in particular the case if probing a 6-piece TB is expected to require disk accesses and probing a 5-piece TB translates in a RAM lookup.)

Instead of a fixed probedepth value (or two values for 6- and 5-piece), more advanced criteria could surely be thought of. But the principle remains: try to probe as close to the root as you can.

I will be surprised if it happens but
if the average number of probing after the interior tablebase node is smaller than 1 then at least in theory he may be right.
But even then I am right: the decision to not probe the first TB position should translate in decisions not to probe in any of the nodes of the subtree (modulo the 5-piece vs 6-piece thing).

It seems logical but I can imagine also other possibilities.

Practically we do not need to probe if we are certain about the result and probing some KQRR vs KN when you are sure that the stronger side wins is a waste of time and it is better to return high score without probing.

Suppose that you are uncertain about the result without probing in some interior node.

You have 3 options:
1)To probe.
2)To search and maybe you get in the search position when you are certain about the result so you do not need to probe.
3)To search without probing even if you are uncertain about the result.

I do not believe that 2 is better than 1 and 3 but
I do not see a theoretical reason that 2 cannot be better than both 1 and 3(faster than 1 and slightly slower than 3 but the accuracy compensation is more than enough).

Of course the average number of probes in 2 should be significantly lower than 1 probe but in theory(I do not believe that it happens) it may be 0.1 when in a big majority of the cases you simply can avoid probing.

mcostalba · Post by **mcostalba** » Sat Jun 04, 2016 7:19 am

mcostalba wrote:
Laskos wrote:First they seriously deteriorate the 5-men play at the root: on randomly occurring in games 5-men wins Natural fails to convert about 5% of them. On 5-men draws, it loses about 0.4% of them. On MES1258 varied endgame testsuite I got a SPRT stop for (-4,1):
BTW did you know that current syzygy when root is in TB completely disables the probing in inner nodes?

So what you have tested is vanilla SF (with DTZ filtering at root) vs natural.

I have done some important improvments to Natural TB, so interested people please pull again from github the updated version.

Thanks.

hgm · Post by **hgm** » Sat Jun 04, 2016 8:02 am

bob wrote:Isn't this right back to the original problem??? IE if just one move guarantees the win by sacrificing the queen...

But that is usually not the case. It is just that one move gives the fastest detectable win. If you are in KQNNKP with only 5-men, any move that does not stalemate will guarantee the win, as well as the possibility for a quick conversion to a won 5-men position, and that is quite typical for when you have such an overwhelming power that you can afford a sac-for-nothing. You can always force the sac later, Queen sacs are very easy to force. Playing the move that is best accourding to the heuristic evaluation tends to make them even more easily to force, as the evaluation is designed to award driving the losing King to the edge/corner in these cases.

Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB

Re: Natural TB