By the way, in your example, I think there also exists many equivalently short lines to get to that position without involving double blunders, so it may not be the case that the only chance of expanding the node is by one particular bad sequence of moves that never worth exploring.
Sure, this was just a simple example to make the point, for an accurate one you'd need a somewhat more complicated position / variation.
noobpwnftw wrote: ↑
Wed Oct 23, 2019 12:08 am
koedem wrote: ↑
Tue Oct 22, 2019 11:16 pm
we discard 1.Rc7 and never look at it again
It still shouldn't stop you from expanding it under all circumstances. As the tree grows you will eventually expand one of these.
Now this is a tricky problem of whether you want to force expanding the node when it is your main line, if you do that, then there is a counter-example where you override a shorter line with a draw score when it isn't.
I'd say it is the expected behavior and the only way to get over this(without cutting corners) is to keep growing the tree until your policy eventually lets you to explore such nodes, a.k.a. horizon effect.
I can think of one quick corner-cutting method is to "assume" the known shortest depth when your main line visit hits such a terminal, so that it enables you to continue expanding as if you are on the shortest, however this pollutes the results with your longer move history, but the effect is minimal since you'd only do this for main lines, and the polluted results are rarely used since you'd hardly visit them from their real shortest path again. Now it is either the known shortest(least possible pollution) or the main line(best possible exploration) gets to modify this node, others sees this and just take the cached results, how about that?
An idea I mentioned in a half sentence here might actually solve the problems:
For each position store the length of the shortest line to it as well as a pointer to the previous position (and/or an un-move). That takes only a few Bytes and we can construct the entire line that lead to a position now. (a bit slow but since the NN is much slower anyway it might not be a problem)
Now if we want to expand a line that we can't due to our longer history we simply pretend our history is that of the shortest possible line. Since cycles aren't possible without repetitions (and those are handled separately) we can't get into a loop here. So we expand as we please while still avoiding polluted history problems.
Thoughts on that? There is one problem I can think of, namely that a poorly scoring move that transposes to e.g. a PV line might now not get explored anymore since the repeated visits caused by the PV expansion lead to its N being high and thereby its U being low to speak in UCT terms.
This might be worked around to replace the N used for U calculations by one that once again "weights" main line nodes more strongly in that calculation. In fact something like that already exists in the linked PR (the Nbeta) so in principle should be feasible.