I did something similar a year or two ago with Komodo. I turned off LMR and then just kept statistics by depth and move number of how often one of the moves turned out to be best or created a cutoff. Most of the time it's the cutoff you are looking for (in non-pv nodes.) It was very enlightening and helped me understand the power of LMR and why it works so well. There is a huge reduction in the number of cutoffs caused by each position in the list. I don't have the numbers in from of me, but something like 95% plus were in the first 2 or 3 moves and the next move was some tiny percentage of the remaining and the one after that was just a fraction of the previous and so on.gavanewijk wrote:Hi all, I have recently used ROC curves to analyze different implementations of LMR. I wanted to share the method here because I found it very valuable, not only for LMR but for any kind of pruning, reduction, extension, etc. Also it might spark some new ideas, which of course I would like to hear from you.
Wikipedia is a good place to learn about ROC. Basically it is a graphical representation of the performance of binary classifiers. LMR is one such classifier: based on move count, the result of a search at reduced depth, etc. we decide if we will search a move at full depth or prune it. If the model says we should prune, but a search at normal depth would actually return a score>alpha, we have a False Positive. If the model was right, it is a True Positive. A plot of the number of True Positives/Real Positives against False Positives/Real Negatives is an ROC plot. The further up and left, the better the model. A LMR model furthermore has parameters that can be tuned, e.g. the move number after which we test for pruning. Plotting TP/RP against FP/TN for a range of move numbers gives an ROC curve and provides very good insight in the effect of move number on accuracy (FP/TN) vs. efficiency (TP/RP). (sorry I deviate from standard
classifier performance jargon, but I find this more understandable). In my first experiments, I have collected features that could be used in a LMR model for many different positions, plus of course the real
outcome (score<=alpha or not); see https://docs.google.com/file/d/0B0l5kw6 ... sp=sharing
Some learnings I got from this exercise: only applying LMR in ALL nodes is pointless in my program, it hardly improves accuracy yet severely reduces efficiency. Same for move type (capture yes or no): no added value, so I now prune captures just as any other move. The search result at reduced depth is however indispensible. Without it, the result is very inaccurate.
The cool thing of this method is that you have to collect data only ones to investigate different models and different settings. No need to play millions of fast games for each possible setting. The drawback is that you have to make a guess about what accuracy is required. I have used a FP/TN threshold of about 0.1% as limit, arbitrary choice... Another drawback is that there is no direct result on increase of playing strength. Despite the drawbacks, I plan to use ROC analysis for extensions and futility pruning too.
Anyone else looked at statistics of search strategies this way? Other uses for ROC analysis? And what accuracy is OK (how to find out?). I cannot get a grip on that last point so far.
What is more difficult to measure is the impact of this and how to interpret these numbers. It's those 1% that cause all the problems, otherwise there is no need to even reduce them, just throw them out. So you still want to pick up the few exceptions and now the question is how often does the LMR search actually "salvage" the move?
You also have the "quantum effect" when you introduce LMR into the statistics gathering instrumentation. When you measure it, you change it. Or more precisely with LMR on the statistics change too.
So I think there may be something useful here, especially for gathering insights into how the search works, but there is no substitute for testing.