All and Cut nodes

Karlo Bala · Post by **Karlo Bala** » Fri Jan 18, 2013 10:24 am

bob wrote:
lucasart wrote:OK, now with my corrected code, I did some counting:
- 81% of my predicted All nodes fail low. So 19% error there.
- Also to be noted that 81% of my null searches fail high, and the child of a null search is a predicted All node (so coincidently the %age of All failing low overall or restricted to null children is the same). From that number it is clear that DiscoCheck spends most of its time in the null search.

You guys have more experience that me on that, so what do you reckon ? Do these numbers sound reasonable ? Or should the prediction be much more accurate (indicating that my move ordering sucks) ?
Here is the common metric to see how your ordering works. At any node in the tree where you fail high, at any point in time, first move or not, increment a counter failed_high. At those positions where you increment that counter, if you have just searched one move, increment a second counter failed_high_1st_move.

After the search is done, print this:

percentage = 100 * failed_high_1st_move / failed_high.

You want to be at 90% or above, otherwise your basic move ordering needs work...

IMO, it is better to count FH% based on depth, otherwise we measure (mostly) the effectiveness of the move ordering in QS.

hgm · Post by **hgm** » Fri Jan 18, 2013 11:30 am

Exactly! I always split out all search statistics by depth, and it turns out the effectivity of the various moves (hash, killers) is very much dependent on it at small depths.

bob · Post by **bob** » Fri Jan 18, 2013 6:35 pm

lucasart wrote:
bob wrote:
lucasart wrote:OK, now with my corrected code, I did some counting:
- 81% of my predicted All nodes fail low. So 19% error there.
- Also to be noted that 81% of my null searches fail high, and the child of a null search is a predicted All node (so coincidently the %age of All failing low overall or restricted to null children is the same). From that number it is clear that DiscoCheck spends most of its time in the null search.

You guys have more experience that me on that, so what do you reckon ? Do these numbers sound reasonable ? Or should the prediction be much more accurate (indicating that my move ordering sucks) ?
Here is the common metric to see how your ordering works. At any node in the tree where you fail high, at any point in time, first move or not, increment a counter failed_high. At those positions where you increment that counter, if you have just searched one move, increment a second counter failed_high_1st_move.

After the search is done, print this:

percentage = 100 * failed_high_1st_move / failed_high.

You want to be at 90% or above, otherwise your basic move ordering needs work...
But what if I fail high, or fail low, *before* searching the first move ? Reasons for this are:
- TT pruning
- eval pruning (when eval is significantly above beta, at low depth, and under certain stability conditions, I return a fail high score = eval + tempo_bonus)
- razoring
- null move pruning: don't forget that 81% of the time my null move searh fails high, and I return without searching anything.

Think about this for a second. We are trying to measure the effectiveness of move ordering. Simplest measure is, on average, how many moves do I have to search before I get a fail high. TT cutoffs have nothing to do with move ordering. Null-move search has nothing to do with move ordering. Only thing relevant is that once you start generating moves and searching them, if you get a cutoff, did it happen on the first move (optimal) or on a later move (less optimal). Schaeffer wrote a paper on this many years ago (I think it was him, anyway) where he defined the term "strongly ordered game tree". It was defined as a tree where on average, 90% of the fail-highs happen on the first move. That's what we are measuring.

So, because of the above, my search typically exits in the vast majority of cases without even generating any moves or searching any...

Of course I could do the counting you suggest on the subpopulation of nodes that actually *reach* the move loop. But that subpopulation is relatively small, and could introduce a selection biais, no ?

You are not thinking this through. If you do a null-move search and it fails high, what happened BELOW that null-move? You had to generate real moves and search them, and at the ply below the null-move you obviously want to fail low, to make the null-search fail high. But what about 2 plies down? You want to fail high. On the first move if possible. So only the tt cutoffs prevent any searching and let you ignore move ordering.

That's why the only thing I can reliably measure is the %age of nodes that fail low or fail high, by predicted node type (I have to increment my counters at every possible exit point of the search, and there are quite a few).

Nope, read above. If you fail high 90% of the time (or higher) on the first move searched (if you fail high at all) then your ordering is considered to be good. Otherwise it needs work.

bob · Post by **bob** » Fri Jan 18, 2013 6:39 pm

Karlo Bala wrote:
bob wrote:
lucasart wrote:OK, now with my corrected code, I did some counting:
- 81% of my predicted All nodes fail low. So 19% error there.
- Also to be noted that 81% of my null searches fail high, and the child of a null search is a predicted All node (so coincidently the %age of All failing low overall or restricted to null children is the same). From that number it is clear that DiscoCheck spends most of its time in the null search.

You guys have more experience that me on that, so what do you reckon ? Do these numbers sound reasonable ? Or should the prediction be much more accurate (indicating that my move ordering sucks) ?
Here is the common metric to see how your ordering works. At any node in the tree where you fail high, at any point in time, first move or not, increment a counter failed_high. At those positions where you increment that counter, if you have just searched one move, increment a second counter failed_high_1st_move.

After the search is done, print this:

percentage = 100 * failed_high_1st_move / failed_high.

You want to be at 90% or above, otherwise your basic move ordering needs work...
IMO, it is better to count FH% based on depth, otherwise we measure (mostly) the effectiveness of the move ordering in QS.

I don't count 'em in q-search. Move ordering there is pretty accurate already and the only rule deals with which capture is searched first. That's far different than in the normal search with captures, killers, ft-move, history moves, history ordering, etc...

In Crafty, the fh% in the statistics is purely within the full-width part of the search, not from the q-search which would just inflate the value.

lucasart · Post by **lucasart** » Sat Jan 19, 2013 2:18 am

Robert,

I counted exactly like you said (search not qsearch, %fail high on 1st move out of nodes that fail high, and reached the move loop), and got 88.1%

Note that this was doone on a test suite of 21 positions searched at depth 12 (about 7.5 seconds in total). So the results may fluctuate depending on the conditions, but tha ballpark is right (close to 90%).

Lucas

bob · Post by **bob** » Sat Jan 19, 2013 6:29 pm

lucasart wrote:Robert,

I counted exactly like you said (search not qsearch, %fail high on 1st move out of nodes that fail high, and reached the move loop), and got 88.1%

Note that this was doone on a test suite of 21 positions searched at depth 12 (about 7.5 seconds in total). So the results may fluctuate depending on the conditions, but tha ballpark is right (close to 90%).

Lucas

It might also go up on deeper searches, depending on history and such.

Evert · Post by **Evert** » Mon Jan 21, 2013 8:05 pm

bob wrote: Here is the common metric to see how your ordering works. At any node in the tree where you fail high, at any point in time, first move or not, increment a counter failed_high. At those positions where you increment that counter, if you have just searched one move, increment a second counter failed_high_1st_move.

After the search is done, print this:

percentage = 100 * failed_high_1st_move / failed_high.

You want to be at 90% or above, otherwise your basic move ordering needs work...

Surely having 90% of the cutoffs caused by the first move is not good enough to judge whether the move ordering is really good?

I've been looking at the move ordering in Jazz, and it does quite well by this metric (90% or better cutoffs caused by the first move), but I can get significant differences in time-to-depth (factors of 2-3 or worse) by doing different orderings at the tail of the move list. I suspect my move ordering there can be significantly improved, but so far nothing seems to be a clear improvement. It's possible that I'm stuck at some local optimum, but it may also be that the evaluation is just not very good at judging which positions are promising so the move isn't actually judged "good" until the search blunders into a branch where the advantage has become obvious to the evaluation function...

bob · Post by **bob** » Mon Jan 21, 2013 10:51 pm

Evert wrote:
bob wrote: Here is the common metric to see how your ordering works. At any node in the tree where you fail high, at any point in time, first move or not, increment a counter failed_high. At those positions where you increment that counter, if you have just searched one move, increment a second counter failed_high_1st_move.

After the search is done, print this:

percentage = 100 * failed_high_1st_move / failed_high.

You want to be at 90% or above, otherwise your basic move ordering needs work...
Surely having 90% of the cutoffs caused by the first move is not good enough to judge whether the move ordering is really good?

I've been looking at the move ordering in Jazz, and it does quite well by this metric (90% or better cutoffs caused by the first move), but I can get significant differences in time-to-depth (factors of 2-3 or worse) by doing different orderings at the tail of the move list. I suspect my move ordering there can be significantly improved, but so far nothing seems to be a clear improvement. It's possible that I'm stuck at some local optimum, but it may also be that the evaluation is just not very good at judging which positions are promising so the move isn't actually judged "good" until the search blunders into a branch where the advantage has become obvious to the evaluation function...

There is a cost for better move ordering. Computation effort. There's a gain, a smaller tree. There are trade-offs. 90% is a good target. Going beyond that is very hard, because you are getting smart enough that you could almost use your move ordering to REPLACE the search if you get much higher.

At the tail of the list (I assume you mean you have searched the first part of the list already) I don't see how ordering is going to help. The chances of getting a fail-high after the TT, good captures, killers, and early history moves REALLY goes down, as this is likely an ALL node where ordering is completely irrelevant.

Evert · Post by **Evert** » Mon Jan 21, 2013 11:22 pm

bob wrote: At the tail of the list (I assume you mean you have searched the first part of the list already) I don't see how ordering is going to help. The chances of getting a fail-high after the TT, good captures, killers, and early history moves REALLY goes down, as this is likely an ALL node where ordering is completely irrelevant.

Yes, you're probably right.
I was thinking that the one or two cases where the cutoff comes from move 40 hurt pretty badly in terms of search time, but making a general rule that bumps moves from move 40 up to move 10 will be hard because 1) it doesn't matter for ALL nodes (most of them, at these high move numbers), 2) it will hurt badly if it gets applied when it shouldn't and 3) even when it applies, part of the benefits lost because of extra computation time.

Optimising the worst case seems like such an obviously good idea...

On the topic of ALL nodes: is it sensible to store the move with the largest sub-tree in the TT (so it will be tried first when the node is re-searched)? The idea is that just as sorting root-moves using node counts, the move with the highest node-count is hardest to refute and has the best chance of raising alpha. I don't see how it can hurt to do this (if the node stays an ALL node it doesn't matter and if it doesn't this move is as good a candidate as any), but any benefit to the idea is well below the accuracy of the test I could run for it. I can do a longer one later, but I'm curious if the answer isn't known already.

lucasart · Post by **lucasart** » Wed Jun 12, 2013 3:37 am

Sorry for resuscitating this old thread, but I still didn't get to the bottom of this, and I feel some information is missing in this discussion.

In short
* I predict node types, the usual way (and doing some counting when in doubt). In particular, I used a dynamic prediction where a predicted Cut becomes an predicted All node, when if doesn't fail high on move #1
* Some people reported that doing things differently at All and Cut nodes was beneficial. For example using a different formula for reduction = f(depth, move_count) in All and Cut nodes worked. No matter how hard I try, this never worked in DiscoCheck

Could it simply be because the hash table is cutting the grass under my feet? Is it worth it to store in each TT entries:
* the actual node type
* and the initially predicted node type
So that entries distinguish whrether the node was search under the initial assumption that it would be All or Cut, and not use an All entry for pruning in a predicted Cut node, and vice versa ?

Or am I just going to make a mess of my code, waste my time, and not obtain anything there anyway ?

All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes

Re: All and Cut nodes