Houdini 3 reducing the depth feature

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Dragan
Posts: 108
Joined: Mon Aug 06, 2012 1:55 pm

Re: Houdini 3 reducing the depth feature

Post by Dragan »

This approach makes more sense to me for the fail-low situations, at least from the point of the ELO increase.
Did you try it for fail-lows?
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: Houdini 3 reducing the depth feature

Post by lkaufman »

Houdini wrote:Allow me to reply to different questions/posts in one go.

@Larry: Not everything in an engine is about Elo gain.

Have a nice day,
Robert
True, but this idea would seem to allow an iteration to be completed in less time (given the trigger conditions) while usually showing the same move, so some elo gain is to be expected if the idea is sound. I guess your answer implies that the elo gain was minimal.

Larry
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Houdini 3 reducing the depth feature

Post by bob »

syzygy wrote:
bob wrote:If your goal is to ONLY be able to supply a PV/score quicker, without any Elo change at all, more power to you. My point was about improving play. The basic search we all use is alpha/beta, which has the known characteristic that it provides the SAME best move and score as minimax, but by searching a much smaller tree, allowing us to go much deeper and get more accurate information on later iterations. If you treat PV and non-PV moves differently, you break that characteristic. Because you make it much harder to replace an existing PV (searched with one set of rules) by a different ply-1 move (non-PV) searched by a different set of rules.
Are you implying that crafty treats PV moves and non-PV moves the same? Think about nullmove, LMR...
Yep. The only thing I don't do is null-move along the non-null window branches, of which there are very few. Why? They can't fail high, else I have an instant problem. But LMR? I do it the same EVERYWHERE, yes... I don't see any rational reason to alter extensions, reductions, etc. Assuming you are talking about the ENTIRE PV-move tree, that is, everything below the first (best) move at ply=1. If you are talking about non-null-window moves, of which there are practically none, I do not do null-moves, but I do everything else including LMR and forward pruning.

You used the term "think about it." I did, many times. Why do you bother to search all those other moves at the root, after searching the first? You hope to find something better? How if you short-change the search on those other moves? Seems like a form of "I believe my first move is best, and I am going to put all my eggs in that basket..."
syzygy
Posts: 5713
Joined: Tue Feb 28, 2012 11:56 pm

Re: Houdini 3 reducing the depth feature

Post by syzygy »

bob wrote:
syzygy wrote:
bob wrote:If your goal is to ONLY be able to supply a PV/score quicker, without any Elo change at all, more power to you. My point was about improving play. The basic search we all use is alpha/beta, which has the known characteristic that it provides the SAME best move and score as minimax, but by searching a much smaller tree, allowing us to go much deeper and get more accurate information on later iterations. If you treat PV and non-PV moves differently, you break that characteristic. Because you make it much harder to replace an existing PV (searched with one set of rules) by a different ply-1 move (non-PV) searched by a different set of rules.
Are you implying that crafty treats PV moves and non-PV moves the same? Think about nullmove, LMR...
Yep. The only thing I don't do is null-move along the non-null window branches, of which there are very few. Why? They can't fail high, else I have an instant problem.
Ok, if we just look at a non-PV move turning into a PV move because of a fail high, then a null move following this move should fail low. So I agree we can ignore null move.
But LMR? I do it the same EVERYWHERE, yes... I don't see any rational reason to alter extensions, reductions, etc. Assuming you are talking about the ENTIRE PV-move tree, that is, everything below the first (best) move at ply=1. If you are talking about non-null-window moves, of which there are practically none, I do not do null-moves, but I do everything else including LMR and forward pruning.
Yes, you reduce at PV nodes, but you don't reduce the PV move itself.
However, what I overlooked for a moment is that if a reduced non-PV move (at a PV node or at any other node) fails high, this move is researched without reduction. Only if it fails high again, the move is treated as a fail high, and at PV nodes may become the PV move. So if at the root a move fails high, it has already been searched at full depth.

So I would say that when using LMR at PV nodes, there is certainly a difference between how the PV move and non-PV moves are treated: some of the latter moves are reduced, while the PV move is never reduced. However, this only affects moves failing low. It does not affect how non-PV moves have been searched that are going to replace the PV move.

I guess the Houdini "trick" uses the fact that a non-PV move that failed high at the root, failed high both at its reduced depth and at full depth. Resolving the fail high first at its reduced depth can therefore be expected to behave nicely.
You used the term "think about it." I did, many times. Why do you bother to search all those other moves at the root, after searching the first? You hope to find something better? How if you short-change the search on those other moves? Seems like a form of "I believe my first move is best, and I am going to put all my eggs in that basket..."
Well, crafty and other programs using LMR at the root are short-changing the search on those other moves: as long as those other moves look bad, they are searched only with reduced depth (not all, but most of them). But this is not very different from how humans play chess.

Btw, I don't think we have any real disagreement here. We might have a different understanding of what Houdini does (and of course I don't know what Houdini does exactly).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Houdini 3 reducing the depth feature

Post by bob »

syzygy wrote:
bob wrote:
syzygy wrote:
bob wrote:If your goal is to ONLY be able to supply a PV/score quicker, without any Elo change at all, more power to you. My point was about improving play. The basic search we all use is alpha/beta, which has the known characteristic that it provides the SAME best move and score as minimax, but by searching a much smaller tree, allowing us to go much deeper and get more accurate information on later iterations. If you treat PV and non-PV moves differently, you break that characteristic. Because you make it much harder to replace an existing PV (searched with one set of rules) by a different ply-1 move (non-PV) searched by a different set of rules.
Are you implying that crafty treats PV moves and non-PV moves the same? Think about nullmove, LMR...
Yep. The only thing I don't do is null-move along the non-null window branches, of which there are very few. Why? They can't fail high, else I have an instant problem.
Ok, if we just look at a non-PV move turning into a PV move because of a fail high, then a null move following this move should fail low. So I agree we can ignore null move.
But LMR? I do it the same EVERYWHERE, yes... I don't see any rational reason to alter extensions, reductions, etc. Assuming you are talking about the ENTIRE PV-move tree, that is, everything below the first (best) move at ply=1. If you are talking about non-null-window moves, of which there are practically none, I do not do null-moves, but I do everything else including LMR and forward pruning.
Yes, you reduce at PV nodes, but you don't reduce the PV move itself.
However, what I overlooked for a moment is that if a reduced non-PV move (at a PV node or at any other node) fails high, this move is researched without reduction. Only if it fails high again, the move is treated as a fail high, and at PV nodes may become the PV move. So if at the root a move fails high, it has already been searched at full depth.

So I would say that when using LMR at PV nodes, there is certainly a difference between how the PV move and non-PV moves are treated: some of the latter moves are reduced, while the PV move is never reduced. However, this only affects moves failing low. It does not affect how non-PV moves have been searched that are going to replace the PV move.

I guess the Houdini "trick" uses the fact that a non-PV move that failed high at the root, failed high both at its reduced depth and at full depth. Resolving the fail high first at its reduced depth can therefore be expected to behave nicely.
You used the term "think about it." I did, many times. Why do you bother to search all those other moves at the root, after searching the first? You hope to find something better? How if you short-change the search on those other moves? Seems like a form of "I believe my first move is best, and I am going to put all my eggs in that basket..."
Well, crafty and other programs using LMR at the root are short-changing the search on those other moves: as long as those other moves look bad, they are searched only with reduced depth (not all, but most of them). But this is not very different from how humans play chess.

Btw, I don't think we have any real disagreement here. We might have a different understanding of what Houdini does (and of course I don't know what Houdini does exactly).
The issue seemed to revolve around the definition of "PV node". For example, after making the first move, we reach ply=2. One way of looking at this is that this position is a PV node and NOTHING unusual happens to ANY move here. That seems wrong. The other way, not the way I was initially thinking, was that just the move itself that is searched with an open window is not reduced, or not reduced as much.

So I suppose the definition of "PV node" might need a slight modification. In normal discussion, a PV node, as defined in Knuth/Moore (An analysis of alpha/beta pruning) was a node on the left-hand-side of the graph, one PV node per ply, so the "left-most" nodes. I "think", however, that we are talking about "pv moves" instead, which is a different thing.

Do you disagree with any of that? And do you agree that we "seem" to be talking about PV moves rather than the classic PV nodes? Because I certainly reduce lots of moves at PV nodes. Such as doing LMR at the root, which was better in my testing...

I don't think the last chapter of what to do on a fail low or a fail high has been written. And I am almost certain that they should not be treated exactly the same way, even though this is exactly what I do at the present...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Houdini 3 reducing the depth feature

Post by bob »

Rebel wrote:
bob wrote: I understand the idea, although I do not buy "all in" that it is correct. Why would one do things so differently in the PV, that you make it very hard for a non-PV move to fail high in the first place??? I don't personally assume that this is the way things have to be done, it's just the way some do it at the moment. From a theoretical perspective, it seems wrong, however.
I am not sure how an idea that produces elo can be wrong. About a year of ten ago I added (what I called) the PVS extension as described here. It's (still) good for 20-30 elo. When I would reduce the depth at a fail-high it still could produce the good score and mainline since the hash-table has all the best moves right resulting in several to many extensions.
This is similar to the tt-singular idea in robolito and friends. I've tried several variations, but not your "4 tt moves in a row to trigger an extension." Won't work for me, period, because I use the speed optimization of whenever I have no best move to store (ALL node, for example) I store the FIRST move I searched. Since I search the TT move without generating moves, it can occasionally save time. It is a performance enhancement, rather than a search improvement (I search the same tree if I don't do this, but I have to generate moves first, and occasionally an ALL node becomes a CUT node and this "pseudo-best-move" will cause a fail high with less effort (no move generation). But this means I would be extending way too much using your idea, since every hash entry has a "best move" even though they are often not a "best move" at all, just "a legal move".
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Houdini 3 reducing the depth feature

Post by Don »

bob wrote: The issue seemed to revolve around the definition of "PV node". For example, after making the first move, we reach ply=2. One way of looking at this is that this position is a PV node and NOTHING unusual happens to ANY move here. That seems wrong. The other way, not the way I was initially thinking, was that just the move itself that is searched with an open window is not reduced, or not reduced as much.

So I suppose the definition of "PV node" might need a slight modification. In normal discussion, a PV node, as defined in Knuth/Moore (An analysis of alpha/beta pruning) was a node on the left-hand-side of the graph, one PV node per ply, so the "left-most" nodes. I "think", however, that we are talking about "pv moves" instead, which is a different thing.

Do you disagree with any of that? And do you agree that we "seem" to be talking about PV moves rather than the classic PV nodes? Because I certainly reduce lots of moves at PV nodes. Such as doing LMR at the root, which was better in my testing...

I don't think the last chapter of what to do on a fail low or a fail high has been written. And I am almost certain that they should not be treated exactly the same way, even though this is exactly what I do at the present...
The root position is a pv node. A node is a position of course, not a move.

From any pv node the first move played takes us to another pv node. The second move searched and beyond do not lead to pv nodes. If they are done in the PVS framework where you search them with a zero width window then if one of those nodes fails high, it is searched again AS a pv node, i.e. the move is treated as if it's the first move leading to a new PV node.

Another way to look at this is that all pv nodes have some alpha and beta window, but the zero width window searches are always "scout" searches, designed to always fail. A scout search is never a pv node of course.

For clarity it is useful to have 2 searches. One is searchPV and the other is searchNonPV. searchNonPV could be called searchNullWindow if you wanted. SearchPv often calls searchNonPV but searchNonPV never calls searchPV.

I am not sure that conforms to Knuth/Moore without looking at it again. If you assume a perfectly ordered tree it would be always be the left-node or the first move played, but in practice a non-pv sometimes has to be "promoted" to a pv node which means it "should have" been the first move searched.

The confusion arises when you mix up the term "move" with "node" and this gave Larry a lot of trouble when we talked about it and I was trying to get him to understand. If you have 36 moves that can be played from a PV node Larry made the mistake of thinking of them as "pv moves" because they were searched inside the PV search. But except for 1 (or any that got promoted) they were mostly moves that led to non-pv nodes.

In a program such as yours which does not explicitly make the distinction, it's difficult to identify whether a given node is a pv node or not. If you use PVS search you could say that if the window is zero width it's probably not a pv node but I think it could still be a pv node in some circumstances although I believe that is a very temporary situation.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Houdini 3 reducing the depth feature

Post by bob »

Don wrote:
bob wrote: The issue seemed to revolve around the definition of "PV node". For example, after making the first move, we reach ply=2. One way of looking at this is that this position is a PV node and NOTHING unusual happens to ANY move here. That seems wrong. The other way, not the way I was initially thinking, was that just the move itself that is searched with an open window is not reduced, or not reduced as much.

So I suppose the definition of "PV node" might need a slight modification. In normal discussion, a PV node, as defined in Knuth/Moore (An analysis of alpha/beta pruning) was a node on the left-hand-side of the graph, one PV node per ply, so the "left-most" nodes. I "think", however, that we are talking about "pv moves" instead, which is a different thing.

Do you disagree with any of that? And do you agree that we "seem" to be talking about PV moves rather than the classic PV nodes? Because I certainly reduce lots of moves at PV nodes. Such as doing LMR at the root, which was better in my testing...

I don't think the last chapter of what to do on a fail low or a fail high has been written. And I am almost certain that they should not be treated exactly the same way, even though this is exactly what I do at the present...
The root position is a pv node. A node is a position of course, not a move.

From any pv node the first move played takes us to another pv node. The second move searched and beyond do not lead to pv nodes. If they are done in the PVS framework where you search them with a zero width window then if one of those nodes fails high, it is searched again AS a pv node, i.e. the move is treated as if it's the first move leading to a new PV node.

Another way to look at this is that all pv nodes have some alpha and beta window, but the zero width window searches are always "scout" searches, designed to always fail. A scout search is never a pv node of course.

For clarity it is useful to have 2 searches. One is searchPV and the other is searchNonPV. searchNonPV could be called searchNullWindow if you wanted. SearchPv often calls searchNonPV but searchNonPV never calls searchPV.

I am not sure that conforms to Knuth/Moore without looking at it again. If you assume a perfectly ordered tree it would be always be the left-node or the first move played, but in practice a non-pv sometimes has to be "promoted" to a pv node which means it "should have" been the first move searched.

The confusion arises when you mix up the term "move" with "node" and this gave Larry a lot of trouble when we talked about it and I was trying to get him to understand. If you have 36 moves that can be played from a PV node Larry made the mistake of thinking of them as "pv moves" because they were searched inside the PV search. But except for 1 (or any that got promoted) they were mostly moves that led to non-pv nodes.

In a program such as yours which does not explicitly make the distinction, it's difficult to identify whether a given node is a pv node or not. If you use PVS search you could say that if the window is zero width it's probably not a pv node but I think it could still be a pv node in some circumstances although I believe that is a very temporary situation.
Nothing out of the way with what you wrote. The issue is "What do you do or not do at a PV "node". The root position is ALWAYS a PV node. If you treat it as such, that could mean that at THAT node, you do everything the same way. Which means treating all moves equally. But after making the first move at the root, and going to another PV node, when you come back to the root and try a different move, you end up at a non-PV node unless move ordering is broken.

The issue is, then, are we talking about treating NODES differently (PV vs ALL/CUT) or are we talking about treating MOVES differently? That is where the confusion seems to come in.

What you describe (search PV vs search non-PV) is not about nodes. It is about moves. Because you are treating moves at a PV node differently depending on whether they are searched first, or later.

In Crafty, I treat everything equally, with the only exception being that I don't do null-move at a PV "node", because (a) it can't fail high and (b) I can't allow the result to become the "best move/score" here since a null-move is illegal. But everything else that I do is done uniformly across all moves...

I don't think in ANY search you can accurately say "this is a PV node or this is not." Because we don't have perfect move ordering, and non-PV nodes morph into PV nodes occasionally, when move ordering is wrong. And, of course, PV nodes become non-PV nodes for the same reason.
User avatar
Rebel
Posts: 7381
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Houdini 3 reducing the depth feature

Post by Rebel »

bob wrote:
Rebel wrote:
bob wrote: I understand the idea, although I do not buy "all in" that it is correct. Why would one do things so differently in the PV, that you make it very hard for a non-PV move to fail high in the first place??? I don't personally assume that this is the way things have to be done, it's just the way some do it at the moment. From a theoretical perspective, it seems wrong, however.
I am not sure how an idea that produces elo can be wrong. About a year of ten ago I added (what I called) the PVS extension as described here. It's (still) good for 20-30 elo. When I would reduce the depth at a fail-high it still could produce the good score and mainline since the hash-table has all the best moves right resulting in several to many extensions.
This is similar to the tt-singular idea in robolito and friends. I've tried several variations, but not your "4 tt moves in a row to trigger an extension." Won't work for me, period, because I use the speed optimization of whenever I have no best move to store (ALL node, for example) I store the FIRST move I searched. Since I search the TT move without generating moves, it can occasionally save time. It is a performance enhancement, rather than a search improvement (I search the same tree if I don't do this, but I have to generate moves first, and occasionally an ALL node becomes a CUT node and this "pseudo-best-move" will cause a fail high with less effort (no move generation). But this means I would be extending way too much using your idea, since every hash entry has a "best move" even though they are often not a "best move" at all, just "a legal move".
The question was if reducing the depth at a fail-high can produce the real score and the answer is yes. I noticed Richard told you the same thing.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: Houdini 3 reducing the depth feature

Post by Don »

bob wrote:
Don wrote:
bob wrote: The issue seemed to revolve around the definition of "PV node". For example, after making the first move, we reach ply=2. One way of looking at this is that this position is a PV node and NOTHING unusual happens to ANY move here. That seems wrong. The other way, not the way I was initially thinking, was that just the move itself that is searched with an open window is not reduced, or not reduced as much.

So I suppose the definition of "PV node" might need a slight modification. In normal discussion, a PV node, as defined in Knuth/Moore (An analysis of alpha/beta pruning) was a node on the left-hand-side of the graph, one PV node per ply, so the "left-most" nodes. I "think", however, that we are talking about "pv moves" instead, which is a different thing.

Do you disagree with any of that? And do you agree that we "seem" to be talking about PV moves rather than the classic PV nodes? Because I certainly reduce lots of moves at PV nodes. Such as doing LMR at the root, which was better in my testing...

I don't think the last chapter of what to do on a fail low or a fail high has been written. And I am almost certain that they should not be treated exactly the same way, even though this is exactly what I do at the present...
The root position is a pv node. A node is a position of course, not a move.

From any pv node the first move played takes us to another pv node. The second move searched and beyond do not lead to pv nodes. If they are done in the PVS framework where you search them with a zero width window then if one of those nodes fails high, it is searched again AS a pv node, i.e. the move is treated as if it's the first move leading to a new PV node.

Another way to look at this is that all pv nodes have some alpha and beta window, but the zero width window searches are always "scout" searches, designed to always fail. A scout search is never a pv node of course.

For clarity it is useful to have 2 searches. One is searchPV and the other is searchNonPV. searchNonPV could be called searchNullWindow if you wanted. SearchPv often calls searchNonPV but searchNonPV never calls searchPV.

I am not sure that conforms to Knuth/Moore without looking at it again. If you assume a perfectly ordered tree it would be always be the left-node or the first move played, but in practice a non-pv sometimes has to be "promoted" to a pv node which means it "should have" been the first move searched.

The confusion arises when you mix up the term "move" with "node" and this gave Larry a lot of trouble when we talked about it and I was trying to get him to understand. If you have 36 moves that can be played from a PV node Larry made the mistake of thinking of them as "pv moves" because they were searched inside the PV search. But except for 1 (or any that got promoted) they were mostly moves that led to non-pv nodes.

In a program such as yours which does not explicitly make the distinction, it's difficult to identify whether a given node is a pv node or not. If you use PVS search you could say that if the window is zero width it's probably not a pv node but I think it could still be a pv node in some circumstances although I believe that is a very temporary situation.
Nothing out of the way with what you wrote. The issue is "What do you do or not do at a PV "node". The root position is ALWAYS a PV node. If you treat it as such, that could mean that at THAT node, you do everything the same way. Which means treating all moves equally. But after making the first move at the root, and going to another PV node, when you come back to the root and try a different move, you end up at a non-PV node unless move ordering is broken.

The issue is, then, are we talking about treating NODES differently (PV vs ALL/CUT) or are we talking about treating MOVES differently? That is where the confusion seems to come in.

What you describe (search PV vs search non-PV) is not about nodes. It is about moves. Because you are treating moves at a PV node differently depending on whether they are searched first, or later.

In Crafty, I treat everything equally, with the only exception being that I don't do null-move at a PV "node", because (a) it can't fail high and (b) I can't allow the result to become the "best move/score" here since a null-move is illegal. But everything else that I do is done uniformly across all moves...

I don't think in ANY search you can accurately say "this is a PV node or this is not." Because we don't have perfect move ordering, and non-PV nodes morph into PV nodes occasionally, when move ordering is wrong. And, of course, PV nodes become non-PV nodes for the same reason.
The general answer is that the PV node is handled much more conservatively with respect to pruning and reductions. And you can be a bit more aggressive with respect to extensions in the PV nodes too.

In PV nodes we do not do null move, futility pruning or any of those sorts of things.

In some sense the conceptual model is that the PV search is the "real" search and the only one that matters. That makes it ridiculously selective. Everything else can be viewed as black box to determine if a move should be included in this highly selective PV search. In other words, think of everything else as a candidate move generator for a highly selective program. To be a candidate move generator one would expect it to "throw out" moves faster than they would normally be searched and so the non-pv search does this by means of using null move pruning, futility pruning, highly aggressive LMR, and some surprisingly aggressive forward pruning. It does not select moves, it only decides whether a move is worthy of being included in the "real" search.

This also explains why the branching factor of modern chess programs is 2 or even less. We are essentially just doing a PV search where we are looking (usually) at just 1 or 2 moves and throwing out everything else.

If you think of the non-pv nodes as part of the "real" search you are missing the point. Of course it's semantics, but I see these nodes as simply part of an elaborate recursive candidate move generator in a highly selective search.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.