ROC analysis of Late Move Reductions

gladius · Post by **gladius** » Tue Jun 04, 2013 11:15 pm

Don wrote:it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.

It's a very elegant definition. I did a quick depth 15 measurement across some positions for SF, and 58% of the non-PV node fail highs from main search (those that make it past null-move, etc.) come in cut nodes, using this definition.

Don · Post by **Don** » Tue Jun 04, 2013 11:25 pm

gladius wrote:
Don wrote:it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.
It's a very elegant definition. I did a quick depth 15 measurement across some positions for SF, and 58% of the non-PV node fail highs from main search (those that make it past null-move, etc.) come in cut nodes, using this definition.

Stockfish is one of the few modern programs that don't define CUT and ALL nodes unless that has changed recently. We found it to be a definite benefit to handle them differently. It's not HUGE, but it is a benefit.

gladius · Post by **gladius** » Tue Jun 04, 2013 11:31 pm

Don wrote:
gladius wrote:
Don wrote:it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.
It's a very elegant definition. I did a quick depth 15 measurement across some positions for SF, and 58% of the non-PV node fail highs from main search (those that make it past null-move, etc.) come in cut nodes, using this definition.
Stockfish is one of the few modern programs that don't define CUT and ALL nodes unless that has changed recently. We found it to be a definite benefit to handle them differently. It's not HUGE, but it is a benefit.

It's not in master, just testing. I tried a while back with some all/cut tweaks without any luck. Take #2 is here: http://tests.stockfishchess.org/tests/v ... 14ec4dd802

Daniel Shawul · Post by **Daniel Shawul** » Wed Jun 05, 2013 12:17 am

gladius wrote:
Don wrote:it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.
It's a very elegant definition. I did a quick depth 15 measurement across some positions for SF, and 58% of the non-PV node fail highs from main search (those that make it past null-move, etc.) come in cut nodes, using this definition.

However you define node types reducing expected CUT nodes less,if anything, is the only thing that makes sense. You expect to fail high (which is more than updating alpha in PV nodes) so you should be careful there. That is very intuitive and probably one reason why PV nodes are reduced less or not at all.

As to the definition of node types, clearly the one that switches expected CUT nodes to ALL after searching say 3 moves is better than sticking with original definition. Would you rather look at evidence (i.e fail high in first 3 moves >95% likelihood) and adjust your definition of node types, or stick with a static node type assignment? One could measure how many fail highs there are after searching 3 moves in a CUT node. Iff there are more fail highs after the 3rd move in a CUT node will not switching be good idea. It is much safer to say the node is an ALL node after searching 3 moves. Now for decisions on IID or singular extensions that you have to make before searching any move, you have no choice but to use that static decision. But for LMR mostly you search at least 3 moves so it is most likely that it is an ALL node or your move ordering is screwed. It doesn't make sense to trust static node assignment more than the evidence. It has been proven mostly in parallel search literature that the dynamic assignment gives superior result eg. strong-YBWC.

Anyway even if we disagree on node definition, I can't get around why you would reduce CUT nodes more. I read Don's reply but i still don't get it and I suspect others who reduce CUT nodes more may give other justifications. For parallel search for instance, you avoid splitting at CUT nodes however you define it, why should it be the different (infact opposite) for LMR ?

Houdini · Post by **Houdini** » Wed Jun 05, 2013 12:32 am

Daniel Shawul wrote:Anyway even if we disagree on node definition, I can't get around why you would reduce CUT nodes more. I read Don's reply but i still don't get it and I suspect others who reduce CUT nodes more may give other justifications.

Don's explanation is correct.
One can afford reducing CUT nodes more after the initial move failed to produce a cut, because any error will simply cause a research - this time as an ALL node.
The approach requires some special management of the hash table in order to distinguish information coming from CUT and ALL nodes - in view of the different reduction levels the information is not interchangeable.

Daniel Shawul · Post by **Daniel Shawul** » Wed Jun 05, 2013 1:00 am

Houdini wrote:
Daniel Shawul wrote:Anyway even if we disagree on node definition, I can't get around why you would reduce CUT nodes more. I read Don's reply but i still don't get it and I suspect others who reduce CUT nodes more may give other justifications.
Don's explanation is correct.
One can afford reducing CUT nodes more after the initial move failed to produce a cut, because any error will simply cause a research - this time as an ALL node.

Well then what is the difference with a node that is originally assigned as an ALL node? The question is not if you can reduce CUT nodes more but if you should reduce them more than ALL nodes. After searching 3 moves the node has become an ALL node with >95% probability so you reduce them as much as you reduce any ALL node. Also as I said the dynamic node assignment has been proven to be better than static node assignment for other applications so what is the reason for not using it. I can muster up any excuse to justify absurd reduction methodology attaching a 'doesn't give much elo but if you do this and this only it works'. The question is if you have the idea first, and then did the implementation or the reverse.

The approach requires some special management of the hash table in order to distinguish information coming from CUT and ALL nodes - in view of the different reduction levels the information is not interchangeable.

Sounds gibberish at least for now..

bob · Post by **bob** » Wed Jun 05, 2013 1:36 am

Don wrote:
bob wrote:
Don wrote:
Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird.

We also reduce cut nodes more. It appears to be safer to do that and I will explain our reasoning on this. I don't for sure it this reasoning is correct but many of our tests show that reducing them more has an unexpected effect which is the opposite of what you might expect. It can actually slow down the search slightly and improve the quality. It's even easy to reason about why this might be happening. Here is how it goes. Keep in mind all of this is from a PVS search framework:

"reduce cut nodes more"? A normal cut node only searches one move, and you never reduce the first move searched. I'm not following here...

This is all in the context of how we define a cut node and an all node which I made clear in my post which you apparently missed seeing.

I saw it, in fact. It is an exact repeat of what I did in the DTS implementation of Cray Blitz. But the concept of "reducing a cut node" is a contradiction in terms if you add "but don't reduce the first move." A "CUT" node would only have one move searched if it is a true CUT node.

My original comment still stands. I believe one reduces moves on the basis of the move, not the position. It is not uncommon for a CUT node to become an ALL node, and vice-versa. If you make decisions on the expected node type, you introduce yet another instability since you would do things differently depending on whether it is really a CUT node or an ALL node..

I won't repeat it in full here but basically we estimate which node type it will be BEFORE it is even searched and we stick with that definition. And our definition has the merits of being much easier to reason about.

The classical definition is useless for a chess program if you actually intend to do anything useful with it.

Also, you don't know a node is a "CUT" node until you get a beta cutoff...

You can guess. You can also flip a coin I suppose??

If you are at a cut node the previous node was an all node. At all nodes you expect all moves to fail low and at cut nodes you expect the first move to fail high and exit. The insight is in this: what happens if a cut node does produce a cutoff? The answer is that the previous move (which is at an ALL node) will fail high and get researched. The re-search saves the day! If the heavy reduction at the cut node causes the search to FAIL to produce the expected cutoff (and it would have otherwise), there is NO damage done because of this.

There is no free lunch however. The more aggressive cut node reductions should save time, but that may not necessarily happen. Because it can trigger additional re-searches at the previous node it can actually cause the search to take MORE time. But strangely enough we have seen a slight quality improvement, probably due to these additional re-searches which with LMR will be done at greater depths (because these won't be reduced.)

The way you categorize cut and all nodes could change all of this however. We have a very anal definition because in reality you cannot know until a search is complete, so we decide BEFORE the first move is searched. It cannot change because it always toggles back and forth. If the previous node was a cut node, we consider this an ALL node. The only exception is for PV nodes which are a special case of ALL nodes. In the PVS framework, the first zero width window searched from a PV node is by our definition a CUT node and if you have to do a re-search then it is suddenly promoted to a PV nodes (as per PVS search) and only then can the cut and all nodes swap positions. In other words, these internal search failures can force the status of every node in the subtree to swap if it propagates back to the last PV nodes.

it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.

Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.

The problem I saw when doing this in DTS is that once you fail to fail-high on the first move of a CUT node, EVERYTHING along the path is now "in doubt". Every node type could be flipping as you find a new best move, or you find the best move is worse than the aspiration bound.

I ended up doing this twice.

Once going down the tree, then again going back up the tree, so that the current CUT node is no longer considered CUT (since we didn't fail on the first move) and the previous ALL node is is now likely a CUT node since we are not failing high, and a fail-low here turns into a fail-high one ply back.

It was complicated, and had a lot of error associated with it.

bob · Post by **bob** » Wed Jun 05, 2013 1:38 am

Daniel Shawul wrote:
Houdini wrote:
Daniel Shawul wrote:Anyway even if we disagree on node definition, I can't get around why you would reduce CUT nodes more. I read Don's reply but i still don't get it and I suspect others who reduce CUT nodes more may give other justifications.
Don's explanation is correct.
One can afford reducing CUT nodes more after the initial move failed to produce a cut, because any error will simply cause a research - this time as an ALL node.
Well then what is the difference with a node that is originally assigned as an ALL node? The question is not if you can reduce CUT nodes more but if you should reduce them more than ALL nodes. After searching 3 moves the node has become an ALL node with >95% probability so you reduce them as much as you reduce any ALL node. Also as I said the dynamic node assignment has been proven to be better than static node assignment for other applications so what is the reason for not using it. I can muster up any excuse to justify absurd reduction methodology attaching a 'doesn't give much elo but if you do this and this only it works'. The question is if you have the idea first, and then did the implementation or the reverse.
The approach requires some special management of the hash table in order to distinguish information coming from CUT and ALL nodes - in view of the different reduction levels the information is not interchangeable.
Sounds gibberish at least for now..

My concern would be that reducing a CUT node more will likely hide the reason it is going to fail high with a normal search. Shallower != better in terms of results, just in terms of performance.

Don · Post by **Don** » Wed Jun 05, 2013 2:08 am

bob wrote:
Don wrote:
bob wrote:
Don wrote:
Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird.

We also reduce cut nodes more. It appears to be safer to do that and I will explain our reasoning on this. I don't for sure it this reasoning is correct but many of our tests show that reducing them more has an unexpected effect which is the opposite of what you might expect. It can actually slow down the search slightly and improve the quality. It's even easy to reason about why this might be happening. Here is how it goes. Keep in mind all of this is from a PVS search framework:

"reduce cut nodes more"? A normal cut node only searches one move, and you never reduce the first move searched. I'm not following here...

This is all in the context of how we define a cut node and an all node which I made clear in my post which you apparently missed seeing.
I saw it, in fact. It is an exact repeat of what I did in the DTS implementation of Cray Blitz. But the concept of "reducing a cut node" is a contradiction in terms if you add "but don't reduce the first move." A "CUT" node would only have one move searched if it is a true CUT node.

What if the second move gives the cutoff? Do you consider that an all node just because the moves were ordered wrong?

My original comment still stands. I believe one reduces moves on the basis of the move, not the position. It is not uncommon for a CUT node to become an ALL node, and vice-versa. If you make decisions on the expected node type, you introduce yet another instability since you would do things differently depending on whether it is really a CUT node or an ALL node..

We feel differently. We don't do null pruning at PV nodes for example and repeated experiments have show us that this hurts the program. I don't really understand why, I just know that the node type matters for Komodo. Komodo seems to be a relatively strong program despite all these blunders I am making.

I won't repeat it in full here but basically we estimate which node type it will be BEFORE it is even searched and we stick with that definition. And our definition has the merits of being much easier to reason about.

The classical definition is useless for a chess program if you actually intend to do anything useful with it.

Also, you don't know a node is a "CUT" node until you get a beta cutoff...

You can guess. You can also flip a coin I suppose??

If you are at a cut node the previous node was an all node. At all nodes you expect all moves to fail low and at cut nodes you expect the first move to fail high and exit. The insight is in this: what happens if a cut node does produce a cutoff? The answer is that the previous move (which is at an ALL node) will fail high and get researched. The re-search saves the day! If the heavy reduction at the cut node causes the search to FAIL to produce the expected cutoff (and it would have otherwise), there is NO damage done because of this.

There is no free lunch however. The more aggressive cut node reductions should save time, but that may not necessarily happen. Because it can trigger additional re-searches at the previous node it can actually cause the search to take MORE time. But strangely enough we have seen a slight quality improvement, probably due to these additional re-searches which with LMR will be done at greater depths (because these won't be reduced.)

The way you categorize cut and all nodes could change all of this however. We have a very anal definition because in reality you cannot know until a search is complete, so we decide BEFORE the first move is searched. It cannot change because it always toggles back and forth. If the previous node was a cut node, we consider this an ALL node. The only exception is for PV nodes which are a special case of ALL nodes. In the PVS framework, the first zero width window searched from a PV node is by our definition a CUT node and if you have to do a re-search then it is suddenly promoted to a PV nodes (as per PVS search) and only then can the cut and all nodes swap positions. In other words, these internal search failures can force the status of every node in the subtree to swap if it propagates back to the last PV nodes.

it sounds more complicated than it is, but the definition is simple, if you are an odd distance from the last PV node it's a cut node, otherwise it's an all node.

Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.
The problem I saw when doing this in DTS is that once you fail to fail-high on the first move of a CUT node, EVERYTHING along the path is now "in doubt". Every node type could be flipping as you find a new best move, or you find the best move is worse than the aspiration bound.

I ended up doing this twice.

Once going down the tree, then again going back up the tree, so that the current CUT node is no longer considered CUT (since we didn't fail on the first move) and the previous ALL node is is now likely a CUT node since we are not failing high, and a fail-low here turns into a fail-high one ply back.

It was complicated, and had a lot of error associated with it.

gavanewijk · Post by **gavanewijk** » Sat Jun 08, 2013 4:59 pm

bob wrote:
Daniel Shawul wrote:In light of the LMR euphoria, I thought i would bump this thread up which actually has some substance. I run some tests regarding lmr and prunings some weeks ago. Reducing ALL nodes more is a loss for me after 10000 games as you mentioned but I did it only in the upper parts >6 depth. If i did it everywhere it is a total loss. As I mentioned before, rybka and co seem to reduce CUT nodes more which is weird. Anyway i set expected CUT nodes to ALL after few moves are searched and no cut-offs occur so most will be ALL nodes anyway. The other test i did was to reduce loosing captures by a much lower value (1 ply), while other moves could be reduced as much as 6 ply. That seemed to improve scorpio a bit so i have that enabled by default now. I suspect that captures,checks and others moves that are not reduced form the sole of an LMR heavy engine so tampering with those even with small reduction values may be damaging.
Also i did some test with aggressive pruning but this is a loss as always. Move count pruning never worked for me above depth=2, and it probably never will. I am better off reducing increasing lmr reduction in the upper part of the tree by one more ply than worry about pruning in the last plies.
I don't even follow the concept of doing LMR at just ALL nodes. First, you don't know a node is an ALL node until all moves have been searched without a fail-high. Otherwise you can only guess. LMR applies to moves, not positions. Why would you not reduce a bad move whether it is in an ALL node, a CUT node or a PV node? A bad move is a bad move...

It's a guess indeed, but an educated guess. If your guess is right most of the time it can still very useful. My experience with ALL/CUT guesses is however not positive. They are wrong in about 30% of positions, so I do not use them to decide on if/how to do LMR.
Move number is a much better predictor if a node is an ALL node. The move number two plies lower is also helpful, if two plies lower the first move did not give a cutoff, this increases the likelihood that this node is an ALL node.

ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions

Re: ROC analysis of Late Move Reductions