ROC analysis of Late Move Reductions

Don · Post by **Don** » Tue Apr 02, 2013 2:26 pm

gavanewijk wrote:Hi all, I have recently used ROC curves to analyze different implementations of LMR. I wanted to share the method here because I found it very valuable, not only for LMR but for any kind of pruning, reduction, extension, etc. Also it might spark some new ideas, which of course I would like to hear from you.

Wikipedia is a good place to learn about ROC. Basically it is a graphical representation of the performance of binary classifiers. LMR is one such classifier: based on move count, the result of a search at reduced depth, etc. we decide if we will search a move at full depth or prune it. If the model says we should prune, but a search at normal depth would actually return a score>alpha, we have a False Positive. If the model was right, it is a True Positive. A plot of the number of True Positives/Real Positives against False Positives/Real Negatives is an ROC plot. The further up and left, the better the model. A LMR model furthermore has parameters that can be tuned, e.g. the move number after which we test for pruning. Plotting TP/RP against FP/TN for a range of move numbers gives an ROC curve and provides very good insight in the effect of move number on accuracy (FP/TN) vs. efficiency (TP/RP). (sorry I deviate from standard
classifier performance jargon, but I find this more understandable). In my first experiments, I have collected features that could be used in a LMR model for many different positions, plus of course the real
outcome (score<=alpha or not); see https://docs.google.com/file/d/0B0l5kw6 ... sp=sharing

Some learnings I got from this exercise: only applying LMR in ALL nodes is pointless in my program, it hardly improves accuracy yet severely reduces efficiency. Same for move type (capture yes or no): no added value, so I now prune captures just as any other move. The search result at reduced depth is however indispensible. Without it, the result is very inaccurate.

The cool thing of this method is that you have to collect data only ones to investigate different models and different settings. No need to play millions of fast games for each possible setting. The drawback is that you have to make a guess about what accuracy is required. I have used a FP/TN threshold of about 0.1% as limit, arbitrary choice... Another drawback is that there is no direct result on increase of playing strength. Despite the drawbacks, I plan to use ROC analysis for extensions and futility pruning too.

Anyone else looked at statistics of search strategies this way? Other uses for ROC analysis? And what accuracy is OK (how to find out?). I cannot get a grip on that last point so far.

I did something similar a year or two ago with Komodo. I turned off LMR and then just kept statistics by depth and move number of how often one of the moves turned out to be best or created a cutoff. Most of the time it's the cutoff you are looking for (in non-pv nodes.) It was very enlightening and helped me understand the power of LMR and why it works so well. There is a huge reduction in the number of cutoffs caused by each position in the list. I don't have the numbers in from of me, but something like 95% plus were in the first 2 or 3 moves and the next move was some tiny percentage of the remaining and the one after that was just a fraction of the previous and so on.

What is more difficult to measure is the impact of this and how to interpret these numbers. It's those 1% that cause all the problems, otherwise there is no need to even reduce them, just throw them out. So you still want to pick up the few exceptions and now the question is how often does the LMR search actually "salvage" the move?

You also have the "quantum effect" when you introduce LMR into the statistics gathering instrumentation. When you measure it, you change it. Or more precisely with LMR on the statistics change too.

So I think there may be something useful here, especially for gathering insights into how the search works, but there is no substitute for testing.

Don · Post by **Don** » Tue Apr 02, 2013 2:29 pm

A correction to my previous post in BOLD. I removed a phrase and added a word.

Don wrote:
gavanewijk wrote:Hi all, I have recently used ROC curves to analyze different implementations of LMR. I wanted to share the method here because I found it very valuable, not only for LMR but for any kind of pruning, reduction, extension, etc. Also it might spark some new ideas, which of course I would like to hear from you.

Wikipedia is a good place to learn about ROC. Basically it is a graphical representation of the performance of binary classifiers. LMR is one such classifier: based on move count, the result of a search at reduced depth, etc. we decide if we will search a move at full depth or prune it. If the model says we should prune, but a search at normal depth would actually return a score>alpha, we have a False Positive. If the model was right, it is a True Positive. A plot of the number of True Positives/Real Positives against False Positives/Real Negatives is an ROC plot. The further up and left, the better the model. A LMR model furthermore has parameters that can be tuned, e.g. the move number after which we test for pruning. Plotting TP/RP against FP/TN for a range of move numbers gives an ROC curve and provides very good insight in the effect of move number on accuracy (FP/TN) vs. efficiency (TP/RP). (sorry I deviate from standard
classifier performance jargon, but I find this more understandable). In my first experiments, I have collected features that could be used in a LMR model for many different positions, plus of course the real
outcome (score<=alpha or not); see https://docs.google.com/file/d/0B0l5kw6 ... sp=sharing

Some learnings I got from this exercise: only applying LMR in ALL nodes is pointless in my program, it hardly improves accuracy yet severely reduces efficiency. Same for move type (capture yes or no): no added value, so I now prune captures just as any other move. The search result at reduced depth is however indispensible. Without it, the result is very inaccurate.

The cool thing of this method is that you have to collect data only ones to investigate different models and different settings. No need to play millions of fast games for each possible setting. The drawback is that you have to make a guess about what accuracy is required. I have used a FP/TN threshold of about 0.1% as limit, arbitrary choice... Another drawback is that there is no direct result on increase of playing strength. Despite the drawbacks, I plan to use ROC analysis for extensions and futility pruning too.

Anyone else looked at statistics of search strategies this way? Other uses for ROC analysis? And what accuracy is OK (how to find out?). I cannot get a grip on that last point so far.
I did something similar a year or two ago with Komodo. I turned off LMR and then just kept statistics by depth and move number of how often one of the moves turned out to be best or created a cutoff. Most of the time it's the cutoff you are looking for (in non-pv nodes.) It was very enlightening and helped me understand the power of LMR and why it works so well. There is a huge reduction in the number of cutoffs caused by each position in the list. I don't have the numbers in from of me, but something like 95% plus were in the first 2 or 3 moves and the next move was MOST of the remaining and the one after that was just a fraction of the previous and so on.

What is more difficult to measure is the impact of this and how to interpret these numbers. It's those 1% that cause all the problems, otherwise there is no need to even reduce them, just throw them out. So you still want to pick up the few exceptions and now the question is how often does the LMR search actually "salvage" the move?

You also have the "quantum effect" when you introduce LMR into the statistics gathering instrumentation. When you measure it, you change it. Or more precisely with LMR on the statistics change too.

So I think there may be something useful here, especially for gathering insights into how the search works, but there is no substitute for testing.

gavanewijk · Post by **gavanewijk** » Tue Apr 02, 2013 6:58 pm

Hi Don,
Nice to see that others have tried this too, and found it enlightening.

Don wrote: There is a huge reduction in the number of cutoffs caused by each position in the list. I don't have the numbers in from of me, but something like 95% plus were in the first 2 or 3 moves and the next move was MOST of the remaining and the one after that was just a fraction of the previous and so on.

I'm not sure I understand you; by position you mean position in the move list? And to what does the 95% refer: is it incorrect prunings?

Don wrote: What is more difficult to measure is the impact of this and how to interpret these numbers. It's those 1% that cause all the problems, otherwise there is no need to even reduce them, just throw them out. So you still want to pick up the few exceptions and now the question is how often does the LMR search actually "salvage" the move?

So perhaps it would make sense to differentiate between "poor" and "disastrous" pruning, where "poor" refers to reduced score<=alpha, full search score>alpha but <alpha+small margin. And disasters are: reduced score<=alpha, full search score>alpha+small margin (0.1 pawn or so).

Don wrote: So I think there may be something useful here, especially for gathering insights into how the search works, but there is no substitute for testing.

The statistical testing cannot replace all testing, but it may replace a lot of lengthy testing involved in fine-tuning by one or just a couple of final checks. At least that's my conviction.

Don · Post by **Don** » Tue Apr 02, 2013 7:13 pm

gavanewijk wrote:Hi Don,
Nice to see that others have tried this too, and found it enlightening.

Don wrote: There is a huge reduction in the number of cutoffs caused by each position in the list. I don't have the numbers in from of me, but something like 95% plus were in the first 2 or 3 moves and the next move was MOST of the remaining and the one after that was just a fraction of the previous and so on.
I'm not sure I understand you; by position you mean position in the move list? And to what does the 95% refer: is it incorrect prunings?

Position is position in the move list. 95% of the time the BEST move is the first or second move. As I said, I don't know the exact percentage but 95% is probably close. So if the best move is one of the first 2 moves 95% of the time, then the 3rd move would include a big part of the remaining difference, perhaps the best move is one of the first 3 98% of the time or more. That sort of thing.

Don wrote: What is more difficult to measure is the impact of this and how to interpret these numbers. It's those 1% that cause all the problems, otherwise there is no need to even reduce them, just throw them out. So you still want to pick up the few exceptions and now the question is how often does the LMR search actually "salvage" the move?
So perhaps it would make sense to differentiate between "poor" and "disastrous" pruning, where "poor" refers to reduced score<=alpha, full search score>alpha but <alpha+small margin. And disasters are: reduced score<=alpha, full search score>alpha+small margin (0.1 pawn or so).

Don wrote: So I think there may be something useful here, especially for gathering insights into how the search works, but there is no substitute for testing.
The statistical testing cannot replace all testing, but it may replace a lot of lengthy testing involved in fine-tuning by one or just a couple of final checks. At least that's my conviction.

It might help you make a good first guess.

Daniel Shawul · Post by **Daniel Shawul** » Tue Jun 04, 2013 5:33 pm

In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird. Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.

Don · Post by **Don** » Tue Jun 04, 2013 6:13 pm

Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird.

We also reduce cut nodes more. It appears to be safer to do that and I will explain our reasoning on this. I don't for sure it this reasoning is correct but many of our tests show that reducing them more has an unexpected effect which is the opposite of what you might expect. It can actually slow down the search slightly and improve the quality. It's even easy to reason about why this might be happening. Here is how it goes. Keep in mind all of this is from a PVS search framework:

If you are at a cut node the previous node was an all node. At all nodes you expect all moves to fail low and at cut nodes you expect the first move to fail high and exit. The insight is in this: what happens if a cut node does produce a cutoff? The answer is that the previous move (which is at an ALL node) will fail high and get researched. The re-search saves the day! If the heavy reduction at the cut node causes the search to FAIL to produce the expected cutoff (and it would have otherwise), there is NO damage done because of this.

There is no free lunch however. The more aggressive cut node reductions should save time, but that may not necessarily happen. Because it can trigger additional re-searches at the previous node it can actually cause the search to take MORE time. But strangely enough we have seen a slight quality improvement, probably due to these additional re-searches which with LMR will be done at greater depths (because these won't be reduced.)

The way you categorize cut and all nodes could change all of this however. We have a very anal definition because in reality you cannot know until a search is complete, so we decide BEFORE the first move is searched. It cannot change because it always toggles back and forth. If the previous node was a cut node, we consider this an ALL node. The only exception is for PV nodes which are a special case of ALL nodes. In the PVS framework, the first zero width window searched from a PV node is by our definition a CUT node and if you have to do a re-search then it is suddenly promoted to a PV nodes (as per PVS search) and only then can the cut and all nodes swap positions. In other words, these internal search failures can force the status of every node in the subtree to swap if it propagates back to the last PV nodes.

it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.

Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.

bob · Post by **bob** » Tue Jun 04, 2013 7:34 pm

Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird. Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.

I don't even follow the concept of doing LMR at just ALL nodes. First, you don't know a node is an ALL node until all moves have been searched without a fail-high. Otherwise you can only guess. LMR applies to moves, not positions. Why would you not reduce a bad move whether it is in an ALL node, a CUT node or a PV node? A bad move is a bad move...

bob · Post by **bob** » Tue Jun 04, 2013 7:35 pm

Don wrote:
Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird.

We also reduce cut nodes more. It appears to be safer to do that and I will explain our reasoning on this. I don't for sure it this reasoning is correct but many of our tests show that reducing them more has an unexpected effect which is the opposite of what you might expect. It can actually slow down the search slightly and improve the quality. It's even easy to reason about why this might be happening. Here is how it goes. Keep in mind all of this is from a PVS search framework:

"reduce cut nodes more"? A normal cut node only searches one move, and you never reduce the first move searched. I'm not following here...

Also, you don't know a node is a "CUT" node until you get a beta cutoff...

You can guess. You can also flip a coin I suppose??

If you are at a cut node the previous node was an all node. At all nodes you expect all moves to fail low and at cut nodes you expect the first move to fail high and exit. The insight is in this: what happens if a cut node does produce a cutoff? The answer is that the previous move (which is at an ALL node) will fail high and get researched. The re-search saves the day! If the heavy reduction at the cut node causes the search to FAIL to produce the expected cutoff (and it would have otherwise), there is NO damage done because of this.

There is no free lunch however. The more aggressive cut node reductions should save time, but that may not necessarily happen. Because it can trigger additional re-searches at the previous node it can actually cause the search to take MORE time. But strangely enough we have seen a slight quality improvement, probably due to these additional re-searches which with LMR will be done at greater depths (because these won't be reduced.)

The way you categorize cut and all nodes could change all of this however. We have a very anal definition because in reality you cannot know until a search is complete, so we decide BEFORE the first move is searched. It cannot change because it always toggles back and forth. If the previous node was a cut node, we consider this an ALL node. The only exception is for PV nodes which are a special case of ALL nodes. In the PVS framework, the first zero width window searched from a PV node is by our definition a CUT node and if you have to do a re-search then it is suddenly promoted to a PV nodes (as per PVS search) and only then can the cut and all nodes swap positions. In other words, these internal search failures can force the status of every node in the subtree to swap if it propagates back to the last PV nodes.

it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.

Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.

gladius · Post by **gladius** » Tue Jun 04, 2013 8:04 pm

bob wrote:You can guess. You can also flip a coin I suppose??

A lot of modern search techniques are probabilistic pruning of the search tree though. Sure, you are guessing on the type of the node, but if your guess is pretty good, then it could work quite well.

Move-count pruning is another good example. It will obviously throw out good moves along with the bad, but if the savings from throwing out the bad ones outweighs the loss from missing the good ones, it's still a net win.

Don · Post by **Don** » Tue Jun 04, 2013 8:48 pm

bob wrote:
Don wrote:
Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird.

We also reduce cut nodes more. It appears to be safer to do that and I will explain our reasoning on this. I don't for sure it this reasoning is correct but many of our tests show that reducing them more has an unexpected effect which is the opposite of what you might expect. It can actually slow down the search slightly and improve the quality. It's even easy to reason about why this might be happening. Here is how it goes. Keep in mind all of this is from a PVS search framework:

"reduce cut nodes more"? A normal cut node only searches one move, and you never reduce the first move searched. I'm not following here...

This is all in the context of how we define a cut node and an all node which I made clear in my post which you apparently missed seeing.

I won't repeat it in full here but basically we estimate which node type it will be BEFORE it is even searched and we stick with that definition. And our definition has the merits of being much easier to reason about.

The classical definition is useless for a chess program if you actually intend to do anything useful with it.

Also, you don't know a node is a "CUT" node until you get a beta cutoff...

You can guess. You can also flip a coin I suppose??

If you are at a cut node the previous node was an all node. At all nodes you expect all moves to fail low and at cut nodes you expect the first move to fail high and exit. The insight is in this: what happens if a cut node does produce a cutoff? The answer is that the previous move (which is at an ALL node) will fail high and get researched. The re-search saves the day! If the heavy reduction at the cut node causes the search to FAIL to produce the expected cutoff (and it would have otherwise), there is NO damage done because of this.

There is no free lunch however. The more aggressive cut node reductions should save time, but that may not necessarily happen. Because it can trigger additional re-searches at the previous node it can actually cause the search to take MORE time. But strangely enough we have seen a slight quality improvement, probably due to these additional re-searches which with LMR will be done at greater depths (because these won't be reduced.)

The way you categorize cut and all nodes could change all of this however. We have a very anal definition because in reality you cannot know until a search is complete, so we decide BEFORE the first move is searched. It cannot change because it always toggles back and forth. If the previous node was a cut node, we consider this an ALL node. The only exception is for PV nodes which are a special case of ALL nodes. In the PVS framework, the first zero width window searched from a PV node is by our definition a CUT node and if you have to do a re-search then it is suddenly promoted to a PV nodes (as per PVS search) and only then can the cut and all nodes swap positions. In other words, these internal search failures can force the status of every node in the subtree to swap if it propagates back to the last PV nodes.

it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.

Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.

ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions