Edit: I should note, currently it appears that SF will pick a move as best if it fails high even once.
Feel free to give it a try, BUT
even when the move has failed high only once, its still the best move 88% of the time. Why would we instead pick an old move which is the best move with only 12% chance?
rvida wrote:I did some research with Critter about handling fail highs at root. I measured the probability of unresolved fail highs being the new best move. Whenever the time management decided to stop the search and there was an unresolved fail high, I forced the engine to finish the iteration and logged the result to see if it really found a better move.
- if a move failed high ONCE, there was an 88% probability that this will be the new best move (this surprised me - I expected a higher number)
- if a move failed high TWICE, the probability increased to 97.5%
- on THIRD fail high the probability was 99.97%
I found out that if the time is really running out, it is quite safe to accept a new move after it failed high 2x. Obviously, if there is enough time, the safest way is to finish the iteration.
These numbers confirm my own testing. And when you fail-high at the first move there hardly is a risk at all, exceptions are very rare.
What's also interesting to research (changing the topic a bit) are the false fail-high's, often they will appear as best move one or two iterations later after all. In the 90's I tested 2 options:
1. Accept any false fail-high as best move;
2. Research a false fail-high again but now one ply deeper.
Due to the limited hardware of these days plus the fact false fail-high's are rare it was impossible to make an educated decision.
Here's a critical test. YEARS ago, I ran into an issue with Crafty failing high (at the root) on the null-window (PVS) search. I played that move regardless of whether it failed high on the re-search or got a real score, or failed low. I found Crafty failing high (only on the null-window) then failing low on the re-search, and occasionally playing a blunder. Bruce (Ferret) noticed the same thing. We simply chose to ignore the null-window fail-high unless it failed high on the re-search or got a good score (fail-high on null window, fail low on re-search [back to original window], ignore fail high). I am testing right now to see if that is still an issue or not...
Done some 30K matches, and it turns out even accepting the null-window search by itself, is a measurable improvement.
In summary, here's the fail-highs one can see on root moves, and what I have tested in accepting or rejecting them.
1. aspiration fail high (score lies outside the aspiration window). Accept with no exceptions.
2. PVS (null-window) fail high on a root move (obviously not the first root move). Again accept with no exceptions. Note that a second fail-high here would represent an aspiration window fail high (case 1 above).
3. Reduced-depth search at root. Do not accept this move as a new best move until the research with normal depth has been completed. If the re-search fails high, accept it as a new best move.
case 2 is pretty difficult to hit, because you have to (a) fail high on the first pvs null-window search, but then run out of time before completing the re-search. Only significant case is the fail-high on the original null-window search, followed by a fail-low on the re-search with normal window. Accepting this is more common and was the reason for the almost +10 Elo gain this produced...
Now got to clean that code up as it can "silently" change the best move without displaying anything which is undesirable.
bob wrote:Done some 30K matches, and it turns out even accepting the null-window search by itself, is a measurable improvement.
In summary, here's the fail-highs one can see on root moves, and what I have tested in accepting or rejecting them.
1. aspiration fail high (score lies outside the aspiration window). Accept with no exceptions.
2. PVS (null-window) fail high on a root move (obviously not the first root move). Again accept with no exceptions. Note that a second fail-high here would represent an aspiration window fail high (case 1 above).
3. Reduced-depth search at root. Do not accept this move as a new best move until the research with normal depth has been completed. If the re-search fails high, accept it as a new best move.
case 2 is pretty difficult to hit, because you have to (a) fail high on the first pvs null-window search, but then run out of time before completing the re-search. Only significant case is the fail-high on the original null-window search, followed by a fail-low on the re-search with normal window. Accepting this is more common and was the reason for the almost +10 Elo gain this produced...
Now got to clean that code up as it can "silently" change the best move without displaying anything which is undesirable.
I follow this same recipe in EXchess. On a related issue, I have long wondered how necessary it is to resolve fail-highs followed by fail-lows (or vice-versa) at the root. For the most recent versions of EXchess, I have not bothered to resolve these, and I have just treated these cases as if the score is bounded and moved on to the next iteration. This seems to work fine, but I have wondered what others think. To be clear, here is my aspiration search window driver (g_last is the score from the last iteration)...
The way I resolve search instability, whether at root or within internal nodes, is to take the average of the two fails (soft) as the exact score.
It seems to work fine. The exception is when one of the fails is a draw score of -1/0/+1 due to repetition, in which case a research with infinite window is done.
I am not sure if there should be other exceptions.
Thanks for mentioning that you treat draws differently. I also think this could be important for EXchess (along with treating MATE scores differently as well)... I am now testing a patch with this change. Given how rare these scores are when the game is not already decided, I don't think it is likely to give a big improvement, but it is worth getting these cases right. Right now the patched version is running slightly better, but I've only run 1500 games or so, so lots of uncertainty still.
The oroginal PVS, as defined/implemented by Ken Thompson in Belle worked like this:
If you fail high, immediately make that move best, but don't try to resolve it unless you get a second fail high. Now you have two moves and no way to know which is best, so you re-search with a relaxed beta bound (when I did this I searched the second fail-high move first since I just finished it and had killers/etc still "hot"), and then do a null-window search on the second move to see if it is better or worse. Repeat until done. Many times I would end up with a fail-high move but no score or PV.
For Belle, that worked fine since it didn't have any sort of ft-best-move stuff in the hardware anyway. But for me, I missed two key pieces of information. (1) The PV (with a move to ponder) and (2) the score so that I could use additional time if the score dropped (ken didn't use extra time either)...
Today I try to resolve the score immediately as I want to know what is happening so I can use that to search longer when something goes wrong.
The original fail-hi/fail-low problem doesn't seem to exist and was probably some sort of bug at the time...
dchoman wrote:Thanks for mentioning that you treat draws differently. I also think this could be important for EXchess (along with treating MATE scores differently as well)... I am now testing a patch with this change. Given how rare these scores are when the game is not already decided, I don't think it is likely to give a big improvement, but it is worth getting these cases right. Right now the patched version is running slightly better, but I've only run 1500 games or so, so lots of uncertainty still.
- Dan
In my implementaion, within internal nodes, mate scores do not matter as there is no need for a PVS research with mate scores failing - maybe +mate fail high.