odomobo wrote: ↑Thu Oct 28, 2021 7:15 pm
I wanted some feedback on some thoughts I had. This is all theoretical at this point. My premise is:
- One form of search inaccuracy comes from a path-dependent score being used directly from the TT when the path-dependency cannot be proved. That is, we know we arrived via different paths when we find a transposition, so repetitions can happen in different ways.
- Repetitions rarely significantly influence TT entries.
- TT entries are already slightly invalid (due to path-dependency and repetitions), so if we make them consistently slightly invalid, it should not weaken search strength.
Proposal: never store a score in the TT when it contains path-dependent information. One way to do this is to split the search tree by depth - let's call positions close to the root "shallow positions" and positions close to leaves "deep positions". Deep positions would no longer check for repetitions*, and would store scores in the TT. Shallow positions would still check for repetitions, but would never store score in the TT, and those entries would only be used for move ordering.
My concern with the current approach is that it leaves engines unpredictably blind to search inaccuracy from path-dependency. My proposal would make it predictably blind, which could potentially be improved with heuristics.
* deep positions could check for repetitions in the real game history, just not within the search tree. The real game history is fixed, so there is no path-dependency for such checks.
I had a similar idea recently that I posted about elsewhere, sadly got no replies but let me try again and just copy paste it here.
So, what if we don't return after the first repetition but instead keep going, noting down this first repeated position. Eventually we may repeat the position again, in which case we evaluate it as draw_score due to 3fold-rep. Now, as we back that score up, it might of course be influenced by the upcoming repetition, so we can at most store an appropriate bound. (so if we are +1 despite a repetition, that +1 can only be a lower bound, not an upper bound) So we now lose out on some possible TT information, but we would not be sure of that information. As we keep backing up, it might happen that the repetition is no longer relevant. That is (at least?) when the weaker side deviates away from the repetition line. Or, we eventually reach the position where we repeated the first time. From this position, the "attacking" player had a playing field free of repetitions to consider, so from now on we can treat this as an exact score (or bound, depending on our window). We can now back up further as normally.
Now, if the position is actually a perpetual, then the exact backed up score would still be 0, since there is no good way to deviate. If it isn't, then there is a winning line that does not involve that repetition, so we would have found that one.
We might at the point of the first repetition back up some eval - epsilon so that a variation not involving that first repetition in the first place, would get a better evaluation comparatively.
Now, where I see this breaking is when we probe the TT after that first repetition and get an incorrect score, not achievable in that position in the tree. However, from the first repetition position whatever winning line it is could be played some way, so backing up that score is not necessarily incorrect, so long as we don't store anything in the TT incorrectly. (which I think we would not? If we do, we could change the scheme to never store anything after the first repetition; this reduces the effectiveness of the TT, but those repeated positions should be rare anyway) Now, we would have to be careful there to choose the correct move in that first repetition line, but that should be possible. (at the latest, when this position becomes root, and we notice our root position is one that happened before, we can check for this specifically; before then, just backing up a correct score would be good enough presumably)
There are some more details to be weary of, but could an approach with an idea like this work? (possibly at the cost of performance) (edited)