Questions on SMP search

wgarvin · Post by **wgarvin** » Thu May 12, 2011 7:56 pm

leafs(depth) as the number of nodes encountered at the maximal ply level during the current depth iteration

A quibble with this one.. the leafs might occur at an earlier ply, not necessarily the maximal one. Any node with no children is a leaf. Qsearch, pruning, etc. [edit: sorry, I see you addressed that in your post.]

A leaf at depth=X might or might not be a leaf anymore at depth=Y, where Y > X. I can't see any easy way to calculate number of leaves (or number of interior nodes) other than to just count them during the search.

Zach Wegner · Post by **Zach Wegner** » Thu May 12, 2011 8:20 pm

Laskos wrote:Try to talk that to Bob, he doesn't even know how to read a plot, where the slopes (first derivatives) are giving EBF(target depth, ply) on the tree, not the absolute values, and those slopes are both larger and smaller for different plies than the general EBF(target depth), giving a total pretty constant EBF(target depth) i.e. from target depth N to N+1 (different trees). I am really eager to see someone plot the tree shape of Stockfish for different target depths, there is no way it's not widening compared to the purely exponential, his "widening", which is not a widening at all (he seems to be confused about what widening of the tree is). Bob's theoretical tree shape is the purely exponential, non-widening W^D, how silly is that? I don't think he knows the tree shape even of his Crafty.

Kai

I did plot some Stockfish trees and, as expected since it uses logarithmic reductions, the trees have a huge amount of widening. In fact, the widening accounts for far more of the tree growth than deepening.

Daniel Shawul · Post by **Daniel Shawul** » Thu May 12, 2011 9:21 pm

I haven't been following this thread , however I want to add to the discussion a similar widening method done on a _best first_ search where all nodes are stored in memory. It is called progressive widening (used in UCT search). It is a very selective kind of search where only the best move is expanded (in depth) every visit. An equivalent depth D can be estimated from the total visits at a node by a function of log(n). So after a fixed number of visits of a node, the "width" i.e number of moves considered at a node are increased. The log curve becomes flat after a certain number of nodes are searched at which point all moves will be considered.

Thus towards the root it considers all moves and towards leaves only a few are considered,very similar to LMR I suppose. The depth of search of the best move is expanded after a certain number of visits, but more significantly it is the width that gets widened after a certain number of visits. Same thing also in LMR case but in this case I think the depth increase has a significant effect since the branching factor is low. For Go where the branching factor remains around 300 consistently, it is very hard to increase depth of all moves so the widening is very efficient.What it does is form an artificial reduced branching factor which gets increased progressively as time goes on. It is an important method for games with large branching factor.

Iterative widening vs Iterative deepening. I guess if the branching factor is very low as in chess both may have the same effect...

Daniel Shawul · Post by **Daniel Shawul** » Thu May 12, 2011 9:38 pm

I think there should be be at least some widening. It is very clear especially in games with huge branching factor such as 19x19 go. There I use a very aggresive LMR (depth -= move_count) so so on depth=2 only the first two moves are considered. Using this method you can iteratively deepen and widen at the same time. For high branching factor games, the widening helps more than the deepening but it is not so clear if BF <= 2 for instance.

Rein Halbersma · Post by **Rein Halbersma** » Thu May 12, 2011 9:47 pm

Daniel Shawul wrote:I haven't been following this thread , however I want to add to the discussion a similar widening method done on a _best first_ search where all nodes are stored in memory. It is called progressive widening (used in UCT search). It is a very selective kind of search where only the best move is expanded (in depth) every visit. An equivalent depth D can be estimated from the total visits at a node by a function of log(n). So after a fixed number of visits of a node, the "width" i.e number of moves considered at a node are increased. The log curve becomes flat after a certain number of nodes are searched at which point all moves will be considered.

Thus towards the root it considers all moves and towards leaves only a few are considered,very similar to LMR I suppose. The depth of search of the best move is expanded after a certain number of visits, but more significantly it is the width that gets widened after a certain number of visits. Same thing also in LMR case but in this case I think the depth increase has a significant effect since the branching factor is low. For Go where the branching factor remains around 300 consistently, it is very hard to increase depth of all moves so the widening is very efficient.What it does is form an artificial reduced branching factor which gets increased progressively as time goes on. It is an important method for games with large branching factor.

Iterative widening vs Iterative deepening. I guess if the branching factor is very low as in chess both may have the same effect...

THere is an old post by Gian-Carlo Pascutto about the similarity between alpha-beta and UCT with respect to the trees that they search in practice (even though they are formulated completely differently). [Edit: see here http://talkchess.com/forum/viewtopic.php?p=254434 ]

Daniel Shawul · Post by **Daniel Shawul** » Thu May 12, 2011 10:13 pm

Yes he is right. The simplest parallel search approach of starting from the root is very effective, especially when each searcher is very selective. It is like mergin the width of the different processors to get one wide tree. The catch here is that the narrow trees searched by each processor should be significantly different. To achieve that in UCT different seeds for the random number generators are used. But for alpha-beta there is no such thing so it could be difficult to conclude the same about an alpha-beta tree with or without LMR. Other than that I agree UCT is very similar to LMR. Before I knew of UCT , I used a very aggressive LMR which reduces by 1 unit every other move... And it worked very well even for games with astronomical branching factor.

Gian-Carlo Pascutto · Thu May 12, 2011 10:13 pm

I want to clarify some things Daniel said. When he talks about only expanding the best move, this is really the most "urgent" move. The most urgent move is either the best one (=the PV) or a move which is not explored a lot and has a score not too far from the best (=a move searched undeeply but that is not near the end of the move list). Experience says that the latter part is less important (it's more important to search the PV).

Thus towards the root it considers all moves and towards leaves only a few are considered,very similar to LMR I suppose.

I think "considered" should be more correctly "eligible". If the score for the PV is a lot higher than the alternatives, they will almost not be considered at all, regardless of progressive widening. This decision is independent of the depth. (Compare: usually good moves like killers and so are not reduced, but everything else is)

Progressive widening boils down to the normal UCT formula, if you give the nodes a big prior based on the score you'd normally use for ordering in progressive widening. (So initially, their urgency depends mostly on the static eval and not on the search result) So I do not see it as a different technique as UCT but more as a performance optimization: calculating the UCT formula on 300 moves for each node deep in the tree is slow, and storing them eats memory.

bob · Post by **bob** » Thu May 12, 2011 10:53 pm

Rein Halbersma wrote:
Sven Schüle wrote: If I assume your "depth" is the maximum search depth of an iteration and "ply" the distance of a node to the root node then I can follow you, except for one question: What does "overall EBF of a tree" mean for you?
Yes, depth = maximal remaining depth, i.e. the iterative deepening parameter, and ply = distance to root. With "overall EBF" I simply mean EBF(depth) = nodes(depth) / nodes(depth-1), i.e. the overall growth rate in nodes going from one depth iteration to the next. You can also define share(depth-1, ply) = nodes(depth-1, ply) / nodes(depth-1) as the share of nodes at a given ply level for the previous depth iteration, and leafs(depth) as the number of nodes encountered at the maximal ply level during the current depth iteration. It's then a simple exercise to show that the overall EBF decomposes into
Code: Select all
EBF&#40;depth&#41; = leafs&#40;depth&#41; / nodes&#40;depth-1&#41; + sum_&#123;ply = 0 to depth-1&#125; EBF&#40;depth, ply&#41; * share&#40;depth-1, ply&#41;
This means that the overall growth rate in nodes (EBF) is the sum of the ratio of leaf nodes to previous total nodes, plus the weighted EBF at all ply levels above the leafs, where the weight per ply is the share of each ply level in the previous total nodes. For a completely regular tree, the EBF per ply is equal to 1 (nothing changes from one depth iteration to the next), and the shares all add to one. So for a regular tree, the overall EBF is simply 1 plus the ratio of leaf nodes to previous total nodes.

Obviously qsearch and search extensions complicate this formula a bit, since e.g. leaf nodes occur at different ply levels, but the principle remains the same: EBF decomposes into growth of leafs and "inner growth", aka "widening" or "thickening" depending on your taste
Furthermore, I repeat a proposal that was made by Gerd Isenberg few months ago, to use an improved definition for EBF that tries to address the "odd/even ply" properties of an alpha-beta search tree and to avoid jumping up and down of EBF numbers between subsequent iterations:
Code: Select all
EBF&#40;iterationNo&#41; &#58;= sqrt&#40;nodes&#40;iterationNo&#41; / nodes&#40;iterationNo - 2&#41;)
Another recent interesting thread about EBF was this one.
The previous formula for the decomposition of EBF has an economics analogue in the GDP growth index from one year to the next, which decomposes into the weighted growth indices of goods that already existed the previous year, plus a term accounting for the availability of new goods (i.e. the leafs in tree search). The square root correction of Gerd is an example of seasonal adjusting. E.g. GDP growth indices are often corrected for the same quarter or month (i.e. the depth-2 comparison to correct for the same side to move). This makes them more reliable but also complicates their decomposition a bit.
Back to your summary of arguments, you wrote:
The claim is that for fixed ply, EBF(depth, ply) is increasing in depth, simply because LMR and NMR are reducing more heavily at lower remaining depth.
Are you sure about NMR in this case? Don't we reduce null moves more at higher remaining depth? Basic Stockfish formula (taken from 2.0.1), for instance, is
Code: Select all
        // Null move dynamic reduction based on depth
        int R = 3 + &#40;depth >= 5 * ONE_PLY ? depth / 8 &#58; 0&#41;;
which means R=3 for "remaining depth < 5" and R=3+"remaining depth" / 8 otherwise, which is more reduction at higher remaining depth.

Maybe I misunderstood your point?

Sven
Ah but no! You have to look at remaining depth-R, not just R.
Code: Select all
nullValue = -search<NonPV>&#40;pos, ss+1, -beta, -alpha, depth-R*ONE_PLY&#41;;
If you look carefully at the Stockfish formula, then going from say remaining depth = 7 to depth = 8, the null move search actually stays the same! However, there is a strong effect from the verification search which gets skipped for remaining depth < 6, so this makes NMR reduce more heavily at lower remaining depth.

Anyway, my point is that I agree with Bob that EBF is a simple statistic, and I find it strange that people can get confused about it. Where I disagree with Bob is that I think that EBF is not a useful statistic because of the high degree of irregularity of search trees, and that having EBF(depth, ply) available at all ply levels in the tree is much more informative, because it explains that the effects of reductions such as LMR occur in different parts of the tree.

Rein

I am not sure whether EBF is useful or not. But it is a measure, and it offers some interesting data. For example, suppose you search plies 1 to 24 and for each succeeding ply your EBF is 2.0. But at 25 it goes beyond 2.0, before you even finish. What does this suggest? Something pathological is happening in the tree. Move ordering is breaking down, perhaps, because suddenly you are deep enough to expose a tactical issue that is (a) difficult to defend against and (b) you could not see it at the previous depth.

Hsu mentioned something similar to this that he used to trigger "panic time" in Deep Thought / Deep Blue. I just never ran into a time where I had a chance to ask him about the idea in more detail, although any sudden shift in the shape of the tree suggests that something is up...

And yes, the tree is _highly_ irregular. Perhaps a couple of descriptions.

(1) take a position where pawns are locked (all 8 for both sides). All white pawns are on white squares, black pawns are on black squares. White has a king and 3 white-squared bishops. Black has a king and 3 black-squared bishops. So no checks. No extensions. No captures possible. The search tree will end precisely at the advertised search depth. If you do a 40 ply search in that position, no branches will extend beyond 40, as there are no extensions that apply, and no q-search as no captures are possible. If you draw that tree, root at the bottom, you get an inverted pyramid shape. But the density falls off near the tips, because at the last N plies (4 for Crafty) futility pruning outright throws moves away that look hopeless. In this position, however, there is no futility pruning possible. So why does the density drop off? The reductions shorten many branches to well below nominal depth.

What happens if you go a ply deeper? Some of the branches go deeper as well, some do not. The top edge of the pyramid gets wider because it gets taller and this is a tree.

(2) A more typical tree sees what looks like two pyramids grafted together base-to base. The root is on the bottom, the tree widens up to the typical nominal depth, then narrows sharply since there are far fewer captures/checks than there are regular moves. A few branches go way out. A few stop at the nominal depth (no captures are possible) and some go a few plies out. And for each distinct path from root to tip, each path can be a different length, and be comprised of different numbers of normal nodes and capture-only nodes.

I don't like the "widening" term as it is being used, because ANY tree with a branching factor > 1 is going to get wider as it goes deeper. This is an exponential process. The normal use of "wider" refers to width, which is the W part of W^D. Going wider implies that you somehow look at more moves than before, _NOT_ because you are just adding another exponential layer of growth on to the tree, but because somehow you are looking at more moves at each ply. That is not happening in any program I have seen. Yes, when you go from depth 20 to depth 21, one ply in the tree becomes "wider" because you push the forward pruning one ply deeper, making the ply before where the pruning kicks in switch from selective to full-width.

That's not traditional "widening" in any sort of AI / tree-search context I am aware of, that is just a property of growing a tree, where as the tree grows taller, it also grows wider as a result. Until you get the real BF (not EBF) to exactly 1 (or even less). We are not there in general, although I have certainly found positions where the EBF goes below 1.0. Fine 70 is an example, since you can eventually trace every path to an EGTB hit and at that point, the tree stops growing, no matter how many additional plies you add.

bob · Post by **bob** » Thu May 12, 2011 10:55 pm

Daniel Shawul wrote:I haven't been following this thread , however I want to add to the discussion a similar widening method done on a _best first_ search where all nodes are stored in memory. It is called progressive widening (used in UCT search). It is a very selective kind of search where only the best move is expanded (in depth) every visit. An equivalent depth D can be estimated from the total visits at a node by a function of log(n). So after a fixed number of visits of a node, the "width" i.e number of moves considered at a node are increased. The log curve becomes flat after a certain number of nodes are searched at which point all moves will be considered.

Thus towards the root it considers all moves and towards leaves only a few are considered,very similar to LMR I suppose. The depth of search of the best move is expanded after a certain number of visits, but more significantly it is the width that gets widened after a certain number of visits. Same thing also in LMR case but in this case I think the depth increase has a significant effect since the branching factor is low. For Go where the branching factor remains around 300 consistently, it is very hard to increase depth of all moves so the widening is very efficient.What it does is form an artificial reduced branching factor which gets increased progressively as time goes on. It is an important method for games with large branching factor.

Iterative widening vs Iterative deepening. I guess if the branching factor is very low as in chess both may have the same effect...

Best-first search is a good example of the correct use of the term "widening" as you can expand any node you want, at any time you want. Depth-first doesn't fit this description very well.

Gian-Carlo Pascutto · Thu May 12, 2011 11:20 pm

bob wrote: Best-first search is a good example of the correct use of the term "widening" as you can expand any node you want, at any time you want. Depth-first doesn't fit this description very well.

So, are modern chess programs best-first or depth-first searchers? Does the distinction even make sense, given that we all use the equation better=deeper when doing LMR?

I think it was proven already in 1996 (Aske Plaat) that it isn't...

Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search

Re: Questions on SMP search