LMP in PV nodes

bob · Post by **bob** » Tue Jan 06, 2015 6:58 pm

petero2 wrote:
Michel wrote:
Can theory explain this difference between LMR and LMP behavior?
I assume you mean this as a rethorical question
I did not, but I did suspect that the answer would be "no". Maybe the question should have been "Can hand-waving arguments be made to explain this difference between LMR and LMP behavior?"

Another question is if the behavior in texel can be reproduced in other engines.

Your LMR number seems high compared to my past results. In Crafty, when this issue was raised a couple of years back, null-move removal cost -80 Elo, LMR cost -80, both together costs -120, and when you extrapolate, removing either will reduce elo by 80, then removing the other will lose the other 40. Of course I am now a bit more aggressive in both LMR and null-move R, so the numbers might be higher.

So far I have not produced any significant gain with LMP although I have only fiddled with it for a couple of days here and there. It would seem to me that a more carefully tuned LMR would reduce the benefit of LMP since the ideas are complementary.

mvk · Post by **mvk** » Tue Jan 06, 2015 7:51 pm

Michel wrote:It is possible to deal with the hashing inconsistency by recording the node type in the hash table (this is not an outlandish idea as somebody once wrote on the SF forum that this is what Ippolit does).

The main two mechanisms that already avoid most hashing inconsistencies are that reductions increase from left to right through the tree, and that researches are done with lower reductions.

By the way, in my program I also use pv_distance as one index for the reduction tables, and i started to tune with that.

I want to thank you and ronald for the insightful contributions in this thread. It has certainly helped me understand better why the things work like they do, and gives inspiration to try a lot of new ideas. I was aware of your simulation results, and i saw the even/odd effects that don reported, which is why i started to experiment with pv_distance in the first place. But it didn't all really 'click' fully, until now.

mjlef · Post by **mjlef** » Wed Jan 07, 2015 3:31 pm

bob wrote:
mjlef wrote:
bob wrote:
jdart wrote:Suppose the best move is a sacrifice, and you take the reasonable move ordering approach of trying losing captures last.

Then, with LMP in the root node (we are talking about pruning here, not reductions), you will never play the sacrifice, no matter what depth you achieve. It will keep getting pruned, every time.

--Jon
OK, suppose the 15th move in the root list is a capture that loses. Do you reduce it or not? What about the 15th move at ply=2 positions?

This STILL makes absolutely no sense to me. If it is OK to reduce moves that you HOPE will become a new best move, then it should be just as OK to reduce moves that are already part of a best move.
The discussion is about LMP (Late Move PruningO, not Reductions. Pruned moves are not searched, so pruning at the root would just never search those moves at all.
LMP/LMR do exactly the same thing, they hide parts of the tree. They just do it in different ways. Back to the question, what theoretical basis is there for treating a PV node differently from a non-PV node, since the reason we are searching those non-PV nodes is to find a BETTER PV in the first place?

Huh? Pruning a whole branch of a tree is much riskier than reducing a branch, so I do not consider them as doing "the same thing". This difference is really critical in making selective search work.

mjlef · Post by **mjlef** » Wed Jan 07, 2015 3:39 pm

bob wrote:
Rein Halbersma wrote:
bob wrote: First, LMP does not apply to "expected cut nodes". You only search one move at such nodes any.

No, LMP applies to expected CUT-nodes. With perfect move ordering, you expect to search only 1 move to get the expected fail-high. So if by the last moves, you haven't failed high yet, you simply prune them, faill low here, and fail high one ply upwards. If that fail high is at an expected PV-node, you research with a full window and without LMP (and at an ALL-node, the fail-high will eventually bubble up towards a PV-node, causing a research higher-up in the tree)

If your move ordering at CUT nodes is any good, you should save effort and find a better move over your current PV quicker than without LMP. On occasion, the research might discover a very late move that would have produced the initial fail-high you were looking for, hadn't you done the LMP on the previous null-window search. That's the tradeoff.
"expected" is meaningless. A node is simply PV, CUT or ALL, based on the result of the search. If you search every move, it is an ALL node, regardless of what it was expected to be. Therefore introducing the "expected" modifier to CUT/ALL doesn't really help at all.

Take a traditional search. From measurements, Crafty fails high on the first move searched at any ply about 92% of the time. Talking about any sort of pruning or reduction on a CUT node means you are only talking about 8% of the CUT nodes. And pruning/reducing there is quite dangerous. Why do we reduce later moves more than earlier moves? Because we begin to give up any hope that any of those later moves will fail high because this is likely an ALL node and everything has to be searched without a cutoff happening, so we reduce the effort to finish quicker. But at a CUT node where the best move is not searched first (or a "good enough" move is not searched first) any pruning or reducing might well cause the actual best move to either be reduced to the point the goodness can not be seen, or the move might actually be pruned and we don't get the fail high we want here.

Which takes us full-circle back to my ORIGINAL question. Why treat PV and non-PV nodes differently regarding pruning and reducing decisions? Particularly when you can't say whether a particular root move will become the new best move (and hence the successor node from that move will be a PV node) until after the move has been searched. So you make pruning/reduction decisions based on nothing more than a guess.

I became convinced a long time back that not doing null-move on PV nodes was reasonable. And there is a rational explanation why, since the null-move is not a legal move and can never be best at any node. On a PV node you must identify a best move, which means a null-move can not be backed up. If the null-search fails low, you learn nothing. If it fails high or returns a true score you still learn nothing because it is an illegal move. It only makes sense at CUT nodes where you get the fail-high at a reduced cost. And to be theoretically correct, you should not do a null-search at ALL nodes because everything is expected to fail low anyway. But in the absence of any viable way to identify an ALL node, other than by searching it and noticing that the alpha bound was not improved on (the search was actually completed with no best move) there is no way to implement this and most end up doing null-searches everywhere but at PV nodes. And some do them on PV nodes as well. The difference between doing null-searchst at PV vs not doing them is NOT very large, because there are not very many PV nodes in the tree anyway.

You might want to rethink the consequences of missing a move at a given level. One example. Take the node just under a PV node. I am talking about the node below the second or greater move played at the PV node which is searched by a scout. Just below a PV node, scout search assumes this node will fail high (be an expected cut node). Now, what happens if you hard prune a move at this node, and it misses the cutoff? (since all the other moves are bad). It fails low at this node, which means the scout search at the PV node above will fail high. And research the node as a PV node. So nothing bad happens other than a little search inefficiency and a research. Not taking account what a node is expected to be means you do not consider the effects of bad pruning. We find the "expected" node type very helpful in making all kinds of decisions. If there is a real secret to mastering search, this is it. I will leave it to the reader to work out other cases.

bob · Post by **bob** » Wed Jan 07, 2015 5:50 pm

mjlef wrote:
bob wrote:
Rein Halbersma wrote:
bob wrote: First, LMP does not apply to "expected cut nodes". You only search one move at such nodes any.

No, LMP applies to expected CUT-nodes. With perfect move ordering, you expect to search only 1 move to get the expected fail-high. So if by the last moves, you haven't failed high yet, you simply prune them, faill low here, and fail high one ply upwards. If that fail high is at an expected PV-node, you research with a full window and without LMP (and at an ALL-node, the fail-high will eventually bubble up towards a PV-node, causing a research higher-up in the tree)

If your move ordering at CUT nodes is any good, you should save effort and find a better move over your current PV quicker than without LMP. On occasion, the research might discover a very late move that would have produced the initial fail-high you were looking for, hadn't you done the LMP on the previous null-window search. That's the tradeoff.
"expected" is meaningless. A node is simply PV, CUT or ALL, based on the result of the search. If you search every move, it is an ALL node, regardless of what it was expected to be. Therefore introducing the "expected" modifier to CUT/ALL doesn't really help at all.

Take a traditional search. From measurements, Crafty fails high on the first move searched at any ply about 92% of the time. Talking about any sort of pruning or reduction on a CUT node means you are only talking about 8% of the CUT nodes. And pruning/reducing there is quite dangerous. Why do we reduce later moves more than earlier moves? Because we begin to give up any hope that any of those later moves will fail high because this is likely an ALL node and everything has to be searched without a cutoff happening, so we reduce the effort to finish quicker. But at a CUT node where the best move is not searched first (or a "good enough" move is not searched first) any pruning or reducing might well cause the actual best move to either be reduced to the point the goodness can not be seen, or the move might actually be pruned and we don't get the fail high we want here.

Which takes us full-circle back to my ORIGINAL question. Why treat PV and non-PV nodes differently regarding pruning and reducing decisions? Particularly when you can't say whether a particular root move will become the new best move (and hence the successor node from that move will be a PV node) until after the move has been searched. So you make pruning/reduction decisions based on nothing more than a guess.

I became convinced a long time back that not doing null-move on PV nodes was reasonable. And there is a rational explanation why, since the null-move is not a legal move and can never be best at any node. On a PV node you must identify a best move, which means a null-move can not be backed up. If the null-search fails low, you learn nothing. If it fails high or returns a true score you still learn nothing because it is an illegal move. It only makes sense at CUT nodes where you get the fail-high at a reduced cost. And to be theoretically correct, you should not do a null-search at ALL nodes because everything is expected to fail low anyway. But in the absence of any viable way to identify an ALL node, other than by searching it and noticing that the alpha bound was not improved on (the search was actually completed with no best move) there is no way to implement this and most end up doing null-searches everywhere but at PV nodes. And some do them on PV nodes as well. The difference between doing null-searchst at PV vs not doing them is NOT very large, because there are not very many PV nodes in the tree anyway.
You might want to rethink the consequences of missing a move at a given level. One example. Take the node just under a PV node. I am talking about the node below the second or greater move played at the PV node which is searched by a scout. Just below a PV node, scout search assumes this node will fail high (be an expected cut node). Now, what happens if you hard prune a move at this node, and it misses the cutoff? (since all the other moves are bad). It fails low at this node, which means the scout search at the PV node above will fail high. And research the node as a PV node. So nothing bad happens other than a little search inefficiency and a research. Not taking account what a node is expected to be means you do not consider the effects of bad pruning. We find the "expected" node type very helpful in making all kinds of decisions. If there is a real secret to mastering search, this is it. I will leave it to the reader to work out other cases.

Now move on to the 2nd and later moves at ply=1. There should be a refutation for each one since every other move here should fail low. And in each of those sub-trees you reduce/prune more aggressively, and you do NOT find the critical path that would cause one of those remaining moves to fail high and replace the PV. That is the case I am worrying about, not the one you mentioned. The search along non-PV (forget ALL/CUT, just non-PV) pathways is less rigorous, because the sub-trees are more heavily pruned/reduced, and you simply overlook the tactical issue that might be important.

BTW, in your example, things won't quite work out as you expect. We are talking about pruning decisions at PV vs non-PV nodes. The example you gave was at a PV node, which we already know is going to work OK. It is the OTHER moves at the root that are more interesting due to more aggressive pruning (none of the remaining root moves lead to a PV node so everything gets reduced/pruned more aggressively).

bob · Post by **bob** » Wed Jan 07, 2015 11:14 pm

mjlef wrote:
bob wrote:
mjlef wrote:
bob wrote:
jdart wrote:Suppose the best move is a sacrifice, and you take the reasonable move ordering approach of trying losing captures last.

Then, with LMP in the root node (we are talking about pruning here, not reductions), you will never play the sacrifice, no matter what depth you achieve. It will keep getting pruned, every time.

--Jon
OK, suppose the 15th move in the root list is a capture that loses. Do you reduce it or not? What about the 15th move at ply=2 positions?

This STILL makes absolutely no sense to me. If it is OK to reduce moves that you HOPE will become a new best move, then it should be just as OK to reduce moves that are already part of a best move.
The discussion is about LMP (Late Move PruningO, not Reductions. Pruned moves are not searched, so pruning at the root would just never search those moves at all.
LMP/LMR do exactly the same thing, they hide parts of the tree. They just do it in different ways. Back to the question, what theoretical basis is there for treating a PV node differently from a non-PV node, since the reason we are searching those non-PV nodes is to find a BETTER PV in the first place?
Huh? Pruning a whole branch of a tree is much riskier than reducing a branch, so I do not consider them as doing "the same thing". This difference is really critical in making selective search work.

Only if you want to quibble about details. Either one reduces the tactical horizon, which causes the search to overlook things. Reducing is less dangerous than outright pruning, searching a move but dropping directly to q-search is less dangerous than pruning, but ALL of those are dangerous. I've been doing both pruning and reducing for years, so the ideas are not new. Nor are they exactly the same. But BOTH work by searching only a subset of the normal alpha/beta minimax tree. Hence my statement, "they do exactly the same thing - they HIDE PARTS of the game tree." Yes they do it in different ways. Yes one is more risky than the other (perhaps as one can reduce near the root which is right dangerous when you think about it). But I am still waiting for some sort of technical justification for treating PV and non-PV nodes differently with respect to pruning and reduction decisions.

I simply do not see one. And I am not exactly a beginner at tree searching. Just because ippolit does it is NOT a justification for me... Many programs have done things that are wrong over the years, only to discover that the idea was sub-optimal later. If it works for you, good. If it doesn't, bad. What I am looking for are the REASONS that it makes sense, because this has been discussed by many, tested by many, and there is no consensus. Logic says treating them differently is wrong. I have dumped so many of these trees I am sick of looking at them. There are lots of ways to measure this effect, starting with positions with a very singular solution that has a precise depth it much reach to spot the key point. Or positions where one move appears best until a specific depth is reached where the search exposes that a new move is better. The interesting question becomes, with a specific program, how many EXTRA plies does it take and why?

Pio · Post by **Pio** » Fri Jan 09, 2015 9:46 pm

Hej Peter!

I think it is a huge difference between LMR and LMP. I am pretty sure that LMR will scale much better since it effects the entire tree. LMP will at best be a constant speed up and will not scale, but LMR on the other hand will gain depth at least proportional to search depth for the most vital parts of the tree.

I look at LMP as a special case of LMR trying to cut off those moves that seems so bad that there is not enough depth left to make them look good. I think of LMP as a sort of futility pruning. I think reducing more close to the root in LMR is logical only for the "non tt-move" moves so that the difference between the reductions of the "non tt-move" moves should be the same. I see the tt-move as being extended the closer to the root as you come with the simple reasoning that the closer to the root you are the higher confidence you have on that move being the best.

I cannot see a reason of treating PV nodes differently than other nodes when it comes to reductions. But I can see the point of Michel Van den Bergh's reasoning and I agree that it might be beneficial to treat them differently close to the leaves. From your tests it looks like maybe you could earn some ELO by not reducing PV-nodes close to the leafs. Maybe you could try to not reducing PV-nodes in LMR for maybe the 1/4 of the depth closest to the leafs.

Lycka till!!

lucasart · Post by **lucasart** » Fri Jan 16, 2015 10:55 am

petero2 wrote: Since the previous measurements I have made some other changes in texel (including singular extensions), and decided to test again. Now I get:
Code: Select all
LMP except in PV :   0 elo
LMP everywhere   :  -8 elo
No LMP           : -50 elo
I also ran the corresponding tests for LMR:
Code: Select all
LMR except in PV :  -48 elo
LMR everywhere   :    0 elo
No LMR           : -140 elo
Qualitatively LMP behaves the same as in the previous test even though the actual numbers are different. It can also be seen that LMR is more important than LMP in texel.

It is interesting to note that while LMP hurts in PV nodes (-8 elo), LMR is actually a big win in PV nodes (+48 elo). Can theory explain this difference between LMR and LMP behavior?

I've done quite a bit of testing on the subject recently. Of course, Stockfish is different, so your mileage may vary.

What I've seen is that LMP at PV nodes only hurts at low depth. This can easily be verified by testing it at depth=2,4,6... You will see that it's an enormous elo regression depth=2, and as depth increases it goes back to 0 elo eventually.

There is even a hint that at very long TC it could be beneficial, perhaps due to the reduced search inconsistencies.

Anyway, enough with hand waving arguments. Let's show some concrete results:
http://tests.stockfishchess.org/tests/v ... 71d1d8a750
http://tests.stockfishchess.org/tests/v ... 71d1d8a74e

Problem is that you don't have enough CPU resources to be able to test at LTC with a good elo resolution. So you either take a leap of faith, and enable pruning at PV nodes, assuming that Texel will behave like SF at LTC, or go by your very fast TC results. Up to you!

lucasart · Post by **lucasart** » Fri Jan 16, 2015 11:23 am

More generally, there's every evidence to show that treating PV nodes differently makes no sense. That's what Bob Hyatt told me once, and I tested every possible node type trick in DiscoCheck, they always failed. I am more and more convinced that he is right.

The evidence is overwhelming. Just some recent tests:

Razoring at PV nodes
http://tests.stockfishchess.org/tests/v ... 5c4de1c0b9

Futility + LMP + SEE pruning at PV nodes in search()
http://tests.stockfishchess.org/tests/v ... 28c6bc66fd
http://tests.stockfishchess.org/tests/v ... 26a5666bc6

Futility pruning + SEE pruning at PV nodes in qsearch()
http://tests.stockfishchess.org/tests/v ... 17fe3fc6db
http://tests.stockfishchess.org/tests/v ... 28c6bc6704

Probcut at PV nodes
http://tests.stockfishchess.org/tests/v ... 6d8243c853
http://tests.stockfishchess.org/tests/v ... 5c4de1c0be

Null move pruning at PV nodes
http://tests.stockfishchess.org/tests/v ... 64db6b6d6f
http://tests.stockfishchess.org/tests/v ... 763e07564f

lucasart · Post by **lucasart** » Fri Jan 16, 2015 1:17 pm

LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes

Re: LMP in PV nodes