Questions on Razoring and Code Structure for Pruning

Pio · Post by **Pio** » Fri Jul 10, 2020 3:37 pm

hgm wrote: ↑Fri Jul 10, 2020 9:49 am This just seems extra code for no purpose. All non-captures and captures of Pawns, minors and Rooks would have been pruned because of futility anyway, if you are that much below alpha, and only promotion to Queen or Queen capture would not have been futile. So the node would have done nothing different from QS anyway.

I am just guessing that it might be that they want to look for checks in qs first with razoring so that they might detect a check That simultaneously threatens the queen.

hgm · Post by **hgm** » Fri Jul 10, 2020 4:12 pm

Well, one would assume that at d=1 checks are also exempted from FP.

What I want to point out is that this is not a pruning method at all, but just an implementation quirk. Perhaps Stockfish is such that QS handles the same search more efficiency that the routine for the full-width search, when the non-captures are futile. (So that the two logically should do the same thing.)

If that is the case, it is strange that it would matter whether eval is hugely below alpha, or just by the normal futility margin. They would still do the same thing in both cases.

Desperado · Post by **Desperado** » Sat Jul 11, 2020 12:59 pm

Cheney wrote: ↑Thu Jul 09, 2020 4:00 pm Hi,

I have recently added in extended futility pruning (up to 3 plys) and am having some positive success.

I am reading about Razoring, have tried laying in some flavors of code, tests are not positive. I also decided to look at some other code and now wonder about more than razoring but also code structure.

In reading about Razoring on CPW (https://www.chessprogramming.org/Razoring), the original proposal discusses obtaining the evaluation for each move, sorting that list of moves in decreasing order, and once a move's eval score cannot beat alpha, prune that move and all following moves. I imagine this can happen during/after move generation and the decision to prune happens within the move playing loop.

However, the approaches discussed further down on the page discuss either reducing the depth by 1 or dropping into qsearch. It also appears to be that this is happening before the loop to iterate and play moves.

My first question here is how are the two models the same? What is the true definition of razoring?

The next thought I have on this is how it works with futility pruning (FP). Before my move loop, I set a futile flag for the node I am in (if the static eval score + margin < alpha). Then, within the move loop for that node, certain moves can be pruned. These moves might be moves with or without a history value used for move ordering. But when I think about it, a lot of moves are potentially pruned at any given node. So when I think of razoring, I have to imagine it needs to follow the same rules as FP - do not razor when: in check, delivering check, alpha is near mate, etc.

My question here is about the implementation. If FP is pruning all those moves at depth <=3 then why would I want to run a qsearch on each move or "sort and prune" a contiguous set of ordered moves? Is it maybe I look to do razoring if a node is not flagged for pruning and maybe make the razoring marine less than the futility margin?

Last is when I review other code from other engines, I see razoring and futility pruning performed when entering a node to search (before generating and looping moves) and I will also see futility pruning pre-move-make (within the move loop).

What is the advantage here? I believe performing the pruning method in the top of the node is the same as performing it after making the move where the catch is if I do this in the top of the node, I have to return a score. Is this accurate? Is doing this more efficient?

Thank you

1. First, the difference lies in the semantics:

Outside the loop you test if it is useless to investigate the position further.
Inside the loop you test if a move (the branch) is useless.

2. In the following it is about confirming the uselessness.

a.
Outside of the loop, the current state of the art is to consider a position for a test, which evaluation is so bad, that it will most likely not get above alpha, the lower bound. For this purpose, a quiescence search is performed, since the idea is that "only" capture moves can achieve a big difference in evaluation.

Well, that's a good heuristic, but of course it doesn't include all moves that are able to do so.
Everything else are implementation details that can be different in every engine.
If you decide not to do a confirmation at all, e.g. because the current value is 1500 points below alpha or you use static scoring features or or or...

Starting with the standard technique makes sense, because it is based on the experience of many programmers in the last decades.
This experience also includes the facts that the idea can be applied over several depths.

How the technique is named, "Razoring", "Futility Pruning", "Delta Pruning(QS)" is completely meaningless.
Ultimately, it's all about the idea behind it.

b.
Of course you can choose completely different criteria within the loop.For example attributes of a move like "statistically relevant", "gives chess", "connected" and many more. It is about finding candidates and the corresponding conditions.
Again, at this point it is important that you test a branch, not the position.

c.
At the end of a node, and depending on the confirmation details and parameters for testing a move, you may of course end up with all branches being useless, the position is therefore useless and the effort in the loop was just overhead.

d.
I recently wanted to extend my "outside the loop" implementation and check more moves in the loop. So there were nodes that had already been tested (outside the loop) and the quiescense search had found a move that reached the alpha bound.
Interestingly, this was not confirmed in the loop until the quiet moves came. (Thinking time

)
However, there are enough quiets, which include some kind of threat and lead to big score gain.

e.
Conclusion: both techniques could lead to the same result but are clearly different in content and can complement each other.

Just my 2 cents.
Michael

hgm · Post by **hgm** » Sat Jul 11, 2020 1:20 pm

Desperado wrote: ↑Sat Jul 11, 2020 12:59 pmOutside of the loop, the current state of the art is to consider a position for a test, which evaluation is so bad, that it will most likely not get above alpha, the lower bound. For this purpose, a quiescence search is performed, since the idea is that "only" capture moves can achieve a big difference in evaluation.

It seems to make no difference, though. QS would still loop through the captures, and decide which of those are futile within the loop.

It seems you must have a very special reason to do this, and an even more special reason to do it sometimes, but not other times (i.e. to have an independent margin for it).

A reason to do it always could be that QS generates only captures, while a d=1 search would also generate non-captures. (Which then would never be searched because, as was known in advance, they are all futile.) You would not have that when you do staged move generation anyway; you would then razor before getting to generation of non-captures, or even during generation of captures, when you reach the futile victims.

A reason to have an independent margin for it could be that you want to exempt some moves from futility pruning that would not be searched in QS. E.g. when your evaluation is such that King safety could get you up to 5 Pawns, and you are only 1.5 Pawn below alpha, non-captures that end in the 'King neighborhood' (however you define that) are not necessarily futile. So then it makes sense to loop through all non-captures, and judge them individually for ending in the King neighborhood. But if you don't have any such moves that you want to use a larger margin for, it would be wastefull to put that larger margin as a condition for not using QS (when you know QS to be more efficient).

Desperado · Post by **Desperado** » Sat Jul 11, 2020 1:55 pm

hgm wrote: ↑Sat Jul 11, 2020 1:20 pm
Desperado wrote: ↑Sat Jul 11, 2020 12:59 pmOutside of the loop, the current state of the art is to consider a position for a test, which evaluation is so bad, that it will most likely not get above alpha, the lower bound. For this purpose, a quiescence search is performed, since the idea is that "only" capture moves can achieve a big difference in evaluation.
It seems to make no difference, though. QS would still loop through the captures, and decide which of those are futile within the loop.

It seems you must have a very special reason to do this, and an even more special reason to do it sometimes, but not other times (i.e. to have an independent margin for it).

A reason to do it always could be that QS generates only captures, while a d=1 search would also generate non-captures. (Which then would never be searched because, as was known in advance, they are all futile.) You would not have that when you do staged move generation anyway; you would then razor before getting to generation of non-captures, or even during generation of captures, when you reach the futile victims.

A reason to have an independent margin for it could be that you want to exempt some moves from futility pruning that would not be searched in QS. E.g. when your evaluation is such that King safety could get you up to 5 Pawns, and you are only 1.5 Pawn below alpha, non-captures that end in the 'King neighborhood' (however you define that) are not necessarily futile. So then it makes sense to loop through all non-captures, and judge them individually for ending in the King neighborhood. But if you don't have any such moves that you want to use a larger margin for, it would be wastefull to put that larger margin as a condition for not using QS (when you know QS to be more efficient).

Well, it is very simple. It is not about different margins, it is about that quiet moves can change the static score drastically.
e.g: if a pawn push from d2 to d4 attacks a Knight on c5 and that frees a bishop on c1 to attack the opponent queen on g5 ... that's it.
To ignore them in the test "outside" is a implementation detail and a free decision, not a "i must ignore them".

It is not a waste of time to identify these moves while others are still futil. And as i pointed out at the beginning:
To confirm that a position is useless and to confirm that some branches of a position are useless is something completely different.
It can happen that the answer for both questions are the same, exactly when all branches are confirmed as useless.
(indeed, these nodes result in simply overhead)

...
A reason to do it always could be that QS generates only captures, while a d=1 search would also generate non-captures. (Which then would never be searched because, as was known in advance, they are all futile.) You would not have that when you do staged move generation anyway; you would then razor before getting to generation of non-captures, or even during generation of captures, when you reach the futile victims.
...

I agree, there are some practical issues involved for the case of d==1. If the qs reports a good move, that means it should be >= alpha again
within the loop, which leads to a cut in a non pv node. But this is not always the case, because qs also produce quiet checks as intermediate moves. (state of the art). There are more "practical" issues that can happen. A bad SEE capture will be sorted after quiets but in qs it is executed because there are no quiets. There are many things you can think of...

hgm · Post by **hgm** » Sat Jul 11, 2020 2:23 pm

Well, it is very simple. It is not about different margins, it is about that quiet moves can change the static score drastically.
e.g: if a pawn push from d2 to d4 attacks a Knight on c5 and that frees a bishop on c1 to attack the opponent queen on g5 ... that's it.
To ignore them in the test "outside" is a implementation detail and a free decision, not a "i must ignore them".

To attack the Knight in most evaluations would not give you any eval points, and in those that do likely only a very small bonus. That it gives a Bishop a bit of extra mobility, and delivers a discovered attack on a Queen would be things you only see in a full evaluation, normally made in the daughter node when you decide to make (i.e. not prune) the move. You would have to rely on a static algorithm to recognize a move delivers a fork / double attack (and realize that it is not easily solvable by e.g. trading the Queen and moving away the Knight) even before the evaluation is done. I am not sure any engine (other than NN-based) can do that.

Anyway, it is good to explicitly point out that there is no need to put a very large margin on pruning the node before the loop unless you do have some pre-evaluation judgement of threats delivered by the move. If you would just go into the loop, and FP all non-captures bases on a small margin, using the large margin before the loop makes no sense.

Desperado · Post by **Desperado** » Sat Jul 11, 2020 2:28 pm

hgm wrote: ↑Sat Jul 11, 2020 2:23 pm
Well, it is very simple. It is not about different margins, it is about that quiet moves can change the static score drastically.
e.g: if a pawn push from d2 to d4 attacks a Knight on c5 and that frees a bishop on c1 to attack the opponent queen on g5 ... that's it.
To ignore them in the test "outside" is a implementation detail and a free decision, not a "i must ignore them".
To attack the Knight in most evaluations would not give you any eval points, and in those that do likely only a very small bonus. That it gives a Bishop a bit of extra mobility, and delivers a discovered attack on a Queen would be things you only see in a full evaluation, normally made in the daughter node when you decide to make (i.e. not prune) the move. You would have to rely on a static algorithm to recognize a move delivers a fork / double attack (and realize that it is not easily solvable by e.g. trading the Queen and moving away the Knight) even before the evaluation is done. I am not sure any engine (other than NN-based) can do that.

Anyway, it is good to explicitly point out that there is no need to put a very large margin on pruning the node before the loop unless you do have some pre-evaluation judgement of threats delivered by the move. If you would just go into the loop, and FP all non-captures bases on a small margin, using the large margin before the loop makes no sense.

I think you do not get my point. If you attack two pieces, it is likely that you will capture one when it is your turn again.
It is not about static evaluation, but if you execute the move, the subtree will detect the capture. (remaining depth or qs).
Let's say, after d2-d4 the opponent will even capture the pawn (qs), then you will capture the queen...If the queen captures a pawn
on g2 then you can capture the knight on c5... The examples can lead to a big win in score although the initial move is a quiet move.
All this can happen within a capture sequence or intermediate moves like checking moves.
All this can happen when you are -200 or -300 points below alpha and you decide to check pruning.

hgm · Post by **hgm** » Sat Jul 11, 2020 2:47 pm

The problem is, how do you know you know you attack two pieces? What you propose seems at odds with the principle of 'stand pat'. Which you do (in QS) when the current evaluation is above beta. So you can analyze moves whatever you want at d=1, and decide they are totally winning because you attack two Queens at once and they cannot escape, and not prune the move but play it... And then in the daughter at d=0 the opponent will say "hey, my score is still above beta, let's fail high". It only makes sense to do the move if your evaluation recognizes unavoidable losses (because of forks, skewers and such) and incorperates them in the score, or you have some other code to recognize them and skip stand-pat when it does.

I have tried that in several engines, and it always turned out a dismal failure (Elo-wise). I could think of several explanations for that:
1) leaf nodes of search-trees using null-move are unnatural. There are almost always multiple hanging pieces on both sides. So you think you have a double attack, but the opponent has an even more valuable 'hostage', which he didn't bother to execute so far because he preferred null-moving, but now he does and you fail low anyway. Unlike with positions from games, the tree leaves are very hard to judge statically.
2) true unavoidable losses for the side to move are rare, and doing a complex analysis just to identify some of them just isn't worth it.
3) if you do identify a true double attack, incorporating the expected loss is too pessimistic, and just leads to errors that are as lage but of opposit sign. It is very possible that the double-attack was solvable through a move that is not searched in QS. The attacked Knight might be interpose on the attack of B on Q, or the Q might move away pinning the Pawn attacking the Knight, the Knight might move away attacking a Queen as well, etc.

Having to judge all that statically is a nightmare.

Desperado · Post by **Desperado** » Sun Jul 12, 2020 1:59 pm

The problem is, how do you know you know you attack two pieces?...

You do not need to know that! The reason why a move is strong and why you decide to search the subtree might be different.
If you exclude a killer move because you expect a killer to be strong for any reason it might be because of a double attack or anything else.
Of course killer move is only one ad hoc example.

...What you propose seems at odds with the principle of 'stand pat'...

No.

Ok, i see the discussion is now mainly for d=1. The stand pat situation is a good point ( for discussion ), but you can print easily situations where our imagination fails and you find moves where some evaluation criterias sum together and will exactly do what you described (avoiding stand pat). Just an example...

Code: Select all


[static_eval: -70 static_eval-alpha: -88 score_from_qs: 31 : alpha:18]
move: e1g1

CTM : WHITE
woo : 1
wooo: 1
boo : 1
booo: 1
R50 : 0
EPT : --
br -- bb bq bk bb -- br
bp bp bp -- bp bp bp bp
-- -- -- -- -- bn -- --
-- -- -- -- -- -- -- --
-- -- -- bn -- -- -- --
-- -- wn wb -- wn -- --
wp wp wp wp -- wp wp wp
wr -- wb wq wk -- -- wr

Especially in advanced engines, positional criterias sum easiliy up to a value of a pawn and more.
Looks like getting the king out of the center and some shield bonuses are enough in that case.

... It seems you must have a very special reason to do this, and an even more special reason to do it sometimes, but not other times (i.e. to have an independent margin for it). ...

Thinking about it again, i would say that is absolute valid to use a different margin within the loop because i test two different things. Another reason why it would be ok to use a smaller margin is: if one of X moves is futile, there are X-1 other moves you can check for an improvement.
Again, checking a branch (move) and the position is very different (like two different tasks). If you error the effect might be much softer.
Outside the loop, on the position level, you will ignore the position, not the branch.

I think you would agree that for d > 1 all this stuff becomes more complicated, but the frontier nodes (whatever frontier means today

) include quiets.(Guess:) In most positions quiet moves turn out to be better than captures, otherwise we would never examine quiets at all.

hgm · Post by **hgm** » Sun Jul 12, 2020 7:41 pm

Desperado wrote: ↑Sun Jul 12, 2020 1:59 pmYou do not need to know that! The reason why a move is strong and why you decide to search the subtree might be different.
If you exclude a killer move because you expect a killer to be strong for any reason it might be because of a double attack or anything else.
Of course killer move is only one ad hoc example.

The problem is that it doesn't help to know that the move is winning due to what it does at larger depth (where it could very well have become killer, because the sibbling, searched earlier, suffered less LMR). At d=1 threats that gain Queens will not be able to fail high unless they actually capture that Queen. Any gain that comes later, even if it was verified by a deeper search, will remain invisible. It would only make sense to search a futile killer when you extend it. The maximum it could raise eval by itself (as opposed to what followed from the threat it made) should have been taken care of by the margin for the in-loop FP. An extra large margin on the killers seems useless. The same would hold for any other move that for some reason you know to be very good. At d=1 they just don't stand a chance.

Ok, i see the discussion is now mainly for d=1.

Well, it was said that Stockfish only does this at d=1.

The stand pat situation is a good point ( for discussion ), but you can print easily situations where our imagination fails and you find moves where some evaluation criterias sum together and will exactly do what you described (avoiding stand pat). Just an example...
Code: Select all
[static_eval: -70 static_eval-alpha: -88 score_from_qs: 31 : alpha:18]
move: e1g1

CTM : WHITE
woo : 1
wooo: 1
boo : 1
booo: 1
R50 : 0
EPT : --
br -- bb bq bk bb -- br
bp bp bp -- bp bp bp bp
-- -- -- -- -- bn -- --
-- -- -- -- -- -- -- --
-- -- -- bn -- -- -- --
-- -- wn wb -- wn -- --
wp wp wp wp -- wp wp wp
wr -- wb wq wk -- -- wr

I don't get this. The move is castling, is that searched in QS? Or is this from a d=1 node, where castling was searched with a QS reply? There doesn't seem to be any good capture for black after white castles in the given position, so the score must have been obtained from stand-pat after the castling. It is 101 larger than the eval before the castling. So castling increases eval by 101. Is that centi-Pawn? That seems a gross over-estimate for the value of castling. It would for instance sacrifice a Pawn to push it over the horizon. If castling gets you more than gaining a Pawn, then castling should be searched in QS...

Especially in advanced engines, positional criterias sum easiliy up to a value of a pawn and more.
Looks like getting the king out of the center and some shield bonuses are enough in that case.

You argue from the assumption that the in-loop futility pruning is done on the basis of the full evaluation after the move. Because that is where these positional criteria are summed up. But this is not how it is usually done: one just adds a cheap estimate (such as the victim value, and possibly the change in PST score) of what the move gains to the full evaluation before the move. Subtle effects due to relation with other pieces, such as mobility or pawn-structure changes are supposed to be taken care of by the margin.

If most moves gain or lose you more than a Pawn, in-loop FP only makes sense if you use a margin larger than a Pawn, because you don't want to habitually prune a lot of stuff that scores above alpha. If castling is the only non-capture that earns such an exceptionally high score, it is silly to test all moves individually for raising the score that much (in fact testing whether they are castlings). If you see before the loop that alpha - eval is large enough to prune all non-captures except castlings, you should just try the castlings. It wouldn't be that hard to have a dedicated move geerator that only generates castlings. In fact it is likely that this is already part of your regular move generator.

I think you would agree that for d > 1 all this stuff becomes more complicated, but the frontier nodes (whatever frontier means today ) include quiets.(Guess:) In most positions quiet moves turn out to be better than captures, otherwise we would never examine quiets at all.

Yes, at d > 1 the situation is quite different. But you would use large margins in the first place, for that reason. And the 'dropping into QS' stuff seems to be done only at d=1 (in Stockfish). At d > 1 it would even make less sense; by reducing the depth you get yourself back in the d <= 1 situation. Reducing depth because of futility is totally pointless in general: if you don't believe there is a reasonable chance to pump up the score to alpha in a number of moves, allowing fewer moves for it will certainly not get you there. You should just fail low in that case.

Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning

Re: Questions on Razoring and Code Structure for Pruning