When does a cut-node became an all-node?

Rebel · Post by **Rebel** » Thu Mar 29, 2012 6:01 pm

diep wrote:
BubbaTough wrote:
lkaufman wrote:At least, if this is not so, then Rybka, Ippo, Ivanhoe, Houdini, and Critter are all doing something silly. I don't believe this.
Your conclusion may (or may not) be right, but I abhor your reasoning. You give much too much respect to the techniques used in the top couple of engines in my opinion. I think its more useful, and more accurate, to assume almost everything in the top engines can be improved until you prove to your satisfaction otherwise. If you look at the best programs 10 years ago, and identify the things they got perfect, you will find very few. There is no reason the same won't be true 10 years from now.

-Sam
The reason why todays top engines basically do some trivial things like toying with how much to reduce, which is trivial to experiment with - and as i understand many experiments get carried out that simply are never within a mathematical optimum is a time reason.

It's easy and braindead to experiment with, any other algorithm that's promising yet no one really got it well to work, such as multicut you hear no one about that. Not sure i saw it correct, could be stockfish is using it.

If you have a new algorithm no one ever has tried before, and i really mean ALGORITHM, not so much modifying a few parameters which is what happens here and what the discussion was about - such research eats massive time. fulltime time i'd argue.

As for the engines of 10 years ago, you should give them a tad more credit. It was difficult to test anything back then, as very few had enough hardware to do so. There was a split between the engines back then. A few focussing upon searching deeper by means of cutting the branching factor, and a number that combined all this meanwhile trying to find the maximum amount of tactics.

If one thing was really well optimized in most engines, it was finding the maximum amount of tactics. Todays engines need 2 plies to find what some engines back then saw within 1 ply, especially near the leaves.

An engine really well optimized to find tactics by then was Rebel. It's entire pruning system based upon not missing tactics; maybe Ed wants to show some gems there, showing how efficient Rebel searched tactics back then.

What most here don't realize is that Ed had invented for this his own searching algorithm, maybe Ed wants to comment on that, as his homepage of a few years ago shows the correct tables yet doesn't clearly describe the algorithm, last time i checked (which is well over a year ago).

When i analyzed his algorithm a decade ago, it soon showed the brilliancy of Ed end 80s, start 90s as it prunes way more than any of todays reduction systems meanwhile not missing tactics. No todays reduction system can do that for you. Todays reduction systems are very easy to experiment with. No high IQ needed.

This in contradiction to what Ed has been doing. It's not a trivial algorithm to invent. Fact that we are over 25 years further now and no one ever posted something similar to it is already selfexplaining.

However it has 2 achillesheels and that's that it doesn't work very well together with hashtable and though nullmove is possible it isn't as efficient as in normal depthlimited search. Basically todays searching systems are totally based upon hashtable storage; without efficient hashtable you won't be getting any close to 30 ply.

We are 20 years later now however than times that Ed's algorithm was dominating computerchess. Maybe Ed should ask someone who is good in scientific publications, to write down the algorithm carefully, as Rebel dominated computerchess from end 80s well up to 1998.

In Diep, hashtable always has played a crucial role, so it can't do without.

You cannot tactical rival Ed's algorithm the first 100k nodes you search.

If you'd implement that into Stockfish and go play superbullet games single core, it'll win probably everything.

Only when you use a bunch of cores and a bit more serious time control it will lose of course.

This tells you more about the superbullet testing than anything else.

Vincent

Thanks for all the praise Vince, a bit over the top, but nice to read.

What was in Rebel back then is what is now called "Static nullmove reductions", exploited to the extreme. Then it became known that Dynamic recursive Nullmove was more powerful. Frans was the first one and immediately with Fritz 5 took the lead and stayed on top for a couple of years. Ever since then (although in retrospect) I consider my engine as out-dated.

Surely you can mix Static Nullmove with modern LMR. In december last year the old virus hit me again while I thought I was immune for it and in one month was able to add 30-40 elo doing exactly that, mixing old with new stuff. Obviously the new stuff is much more powerful. A normal evolutionary process.

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 29, 2012 6:04 pm

I'm not sure whether you are proposing making these changes to our own program or to Ivanhoe. In Komodo we make only a minor distinction, and undoing this would just produce a minor elo loss that would take many thousands of games to prove. In Ivanhoe the distinction is a large one, so it would be interesting to change the CUT reductions to match the ALL reductions. I'm not sure that I have the technical skills to do this myself as I've never worked on any code other than Komodo; if anyone wants to run this test it would be interesting. It will still require some thousands of games so a fast time control is necessary. Note that the ALL reductions should be left alone, as changing them has far greater consequences than changing the CUT reductions. This alone does suggest that the distinction makes sense.

Larry

Well yes that is what I am suggesting. Some one with the testing power (maybe Hyatt) can do it. In the mean time I did some tests and checked the YBWC paper. Here is the number of pv, cut and all nodes at the start position for my engine, when I switch node type after 3 moves are searched.

Code: Select all

pv	9746	
cut	1323239	
all	6610511	
total	7943496

precentage	
pv 0.122691571		
cut 16.65814397		
all 83.21916446

So the lion's share (83%) is ALL nodes. PV clearly has very low percentage which makes any possible gain from PV/non-PV distinction a gamble that may actually work since it is the move that will be played. Any gain in from ALL/CUT distinction should come from applying LMR, pruning in the first 3 moves. Otherwise it is 99% likely it is an ALL nodes whether you guessed it as an ALL (correct) or CUT node (incorrect guess). I have also checked the YBW situation I mentioned before and it turns out the advantage comes from doing things differently in the first 3 moves there too.

Standard YBW - does parallel search after 1 move both in CUT/ALL node. (for LMR it means reducing after 1st move).
Modified YBW - does parallel search after 3 moves in CUT but after 0 moves in ALL.

So all do the same thing after 3 moves are searched because the node's fate is determined by that time. But you can take more risk at ALL nodes. I now remember that I tried to use LMR at ALL nodes right away without searching any move with no improvement.

I think that the theory is that the expected node type matters even if it is wrong because the consequences of being wrong differ in the two cases. At expected CUT nodes missing a good move is a speed issue, while at expected ALL nodes it is a quality issue.

The Knuth node classification is just a guess that can't be relied on to that extent IMO. For better classification the node classification should be adjusted according to the evidence like I just did. With a static classification you may get much more percentage of CUT nodes. I agree that reduction of more than 1 ply depending on move order may help because the quality of move progressively decreases with a good ordering, but I do not belive you gain more from reducing late moves (>= 3) of ALL nodes more than that of CUT nodes. What I hoped would gain me more and actually tested was to reduce more with in the first 3 moves (including first move) at an expected ALL node.
YMMV

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 29, 2012 6:27 pm

I did the static node classification that does not adjust its guess, and not surpizingly the CUT/ALL node ratio is now almost 50-50%.

Code: Select all

pv	8125	
cut	2563155	
all	2782758		
total 5354038
	
pv 0.15175462		
cut 47.87330609		
all 51.97493929

If you want to take even more risk switch from CUT to ALL node after 1 move

Code: Select all

pv	7361	
cut	622274	
all	4391033		
total 5020668

pv	0.146613957		
cut   12.39424714		
all    87.4591389

Now ALL node ratio is up to 87%. Does Komodo use a classification system that gave me a 50-50 ratio of CUT/ALL in which case reducing them differently may give different performance? In those methods where more than 80% is an ALL node, it means most of the time the LMR formula for ALL node is being used.

lkaufman · Post by **lkaufman** » Thu Mar 29, 2012 6:28 pm

Daniel Shawul wrote:
I'm not sure whether you are proposing making these changes to our own program or to Ivanhoe. In Komodo we make only a minor distinction, and undoing this would just produce a minor elo loss that would take many thousands of games to prove. In Ivanhoe the distinction is a large one, so it would be interesting to change the CUT reductions to match the ALL reductions. I'm not sure that I have the technical skills to do this myself as I've never worked on any code other than Komodo; if anyone wants to run this test it would be interesting. It will still require some thousands of games so a fast time control is necessary. Note that the ALL reductions should be left alone, as changing them has far greater consequences than changing the CUT reductions. This alone does suggest that the distinction makes sense.

Larry
Well yes that is what I am suggesting. Some one with the testing power (maybe Hyatt) can do it. In the mean time I did some tests and checked the YBWC paper. Here is the number of pv, cut and all nodes at the start position for my engine, when I switch node type after 3 moves are searched.
Code: Select all
pv	9746	
cut	1323239	
all	6610511	
total	7943496

precentage	
pv 0.122691571		
cut 16.65814397		
all 83.21916446		
So the lion's share (83%) is ALL nodes. PV clearly has very low percentage which makes any possible gain from PV/non-PV distinction a gamble that may actually work since it is the move that will be played. Any gain in from ALL/CUT distinction should come from applying LMR, pruning in the first 3 moves. Otherwise it is 99% likely it is an ALL nodes whether you guessed it as an ALL (correct) or CUT node (incorrect guess). I have also checked the YBW situation I mentioned before and it turns out the advantage comes from doing things differently in the first 3 moves there too.
Standard YBW - does parallel search after 1 move both in CUT/ALL node. (for LMR it means reducing after 1st move).
Modified YBW - does parallel search after 3 moves in CUT but after 0 moves in ALL.
So all do the same thing after 3 moves are searched because the node's fate is determined by that time. But you can take more risk at ALL nodes. I now remember that I tried to use LMR at ALL nodes right away without searching any move with no improvement.
I think that the theory is that the expected node type matters even if it is wrong because the consequences of being wrong differ in the two cases. At expected CUT nodes missing a good move is a speed issue, while at expected ALL nodes it is a quality issue.
The Knuth node classification is just a guess that can't be relied on to that extent IMO. For better classification the node classification should be adjusted according to the evidence like I just did. With a static classification you may get much more percentage of CUT nodes. I agree that reduction of more than 1 ply depending on move order may help because the quality of move progressively decreases with a good ordering, but I do not belive you gain more from reducing late moves (>= 3) of ALL nodes more than that of CUT nodes. What I hoped would gain me more and actually tested was to reduce more with in the first 3 moves (including first move) at an expected ALL node.
YMMV

It seems that what you are saying is the exact OPPOSITE of what all the programs which make the CUT/ALL distinction do. They all reduce more and sooner at CUT nodes. Rybka (2 or 3) reduced just one ply starting with the fourth move at ALL nodes (an extra half ply after a dozen moves), while at CUT nodes it reduced even the very first move (if not capture, hash, or killer) by 3 full plies!
I realized it is very easy for me to do your test using Rybka 2.3.2, since I worked on it. So I set CUT node reduction down to ALL node levels. I'm running super-fast games (7 thru 9 ply) as I can't tie up the computer too long for this. After 20,000 games at 7 and 8 ply the version with no distinction is actually 2.8 elo ahead on a time-adjusted basis. Perhaps the benefit requires more depth to see. In any case it doesn't look like a big deal. It is also possible that the idea is sound but was carried too far.

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 29, 2012 6:52 pm

It seems that what you are saying is the exact OPPOSITE of what all the programs which make the CUT/ALL distinction do. They all reduce more and sooner at CUT nodes. Rybka (2 or 3) reduced just one ply starting with the fourth move at ALL nodes (an extra half ply after a dozen moves), while at CUT nodes it reduced even the very first move (if not capture, hash, or killer) by 3 full plies!

That is astounding if it is really the case. Why would you want to reduce earlier and even more at a node where you expect a fail high (i.e CUT node) ? At least it is clear you should wait some before starting reductions ... You delay reductions prunings and parallel search etc at a CUT node, that has been the norm. I think you may have mixed it with the node type at the previous ply or something.

Also please address my other post which shows the significance of the node classification system used. With a static method that do not adjust its classification based on observation you can have as high as 50-50% NODE/CUT ratio. That is expected if you just assign consecutive CUT-ALL-CUT-ALL and never change. However if you take move ordering into consideration you will have more than 80% ALL nodes which means you are using the ALL node formulae most of the time.

lkaufman · Post by **lkaufman** » Thu Mar 29, 2012 7:49 pm

Daniel Shawul wrote:I did the static node classification that does not adjust its guess, and not surpizingly the CUT/ALL node ratio is now almost 50-50%.
Code: Select all
pv	8125	
cut	2563155	
all	2782758		
total 5354038
	
pv 0.15175462		
cut 47.87330609		
all 51.97493929		
If you want to take even more risk switch from CUT to ALL node after 1 move
Code: Select all
pv	7361	
cut	622274	
all	4391033		
total 5020668

pv	0.146613957		
cut   12.39424714		
all    87.4591389		
Now ALL node ratio is up to 87%. Does Komodo use a classification system that gave me a 50-50 ratio of CUT/ALL in which case reducing them differently may give different performance? In those methods where more than 80% is an ALL node, it means most of the time the LMR formula for ALL node is being used.

We never change the node type, so it would be around 50-50 by your estimate.

lkaufman · Post by **lkaufman** » Thu Mar 29, 2012 7:58 pm

Daniel Shawul wrote:
It seems that what you are saying is the exact OPPOSITE of what all the programs which make the CUT/ALL distinction do. They all reduce more and sooner at CUT nodes. Rybka (2 or 3) reduced just one ply starting with the fourth move at ALL nodes (an extra half ply after a dozen moves), while at CUT nodes it reduced even the very first move (if not capture, hash, or killer) by 3 full plies!
That is astounding if it is really the case. Why would you want to reduce earlier and even more at a node where you expect a fail high (i.e CUT node) ? At least it is clear you should wait some before starting reductions ... You delay reductions prunings and parallel search etc at a CUT node, that has been the norm. I think you may have mixed it with the node type at the previous ply or something.

Also please address my other post which shows the significance of the node classification system used. With a static method that do not adjust its classification based on observation you can have as high as 50-50% NODE/CUT ratio. That is expected if you just assign consecutive CUT-ALL-CUT-ALL and never change. However if you take move ordering into consideration you will have more than 80% ALL nodes which means you are using the ALL node formulae most of the time.

Look at Ivanhoe code yourself to confirm, they use a much higher reduction at CUT nodes and start much earlier. If you know of any program that does the opposite, please name it. The theory seems to be that even a reduced search will probably find the cutting move, whereas at an ALL node only a surprise move will fail high, so we try harder to find one at ALL nodes.

As far as I know, none of the top programs changes the node type once it is determined, so it's always close to 50-50.

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 29, 2012 9:13 pm

I just checked ippolit and here is what it has, and clearly it reduces more at an ALL node. I guess there is no point me saying that more moves should be searched at CUT nodes again, so can someone else say here what they do? It really doesn't make sense atleast to me.

ALL nodes

Code: Select all

novel_deepness = deepness - 2 + stretches - &#40;4 + BSR &#40;4 + sch&#41;);
	      if &#40;novel_deepness <= 1&#41;
		itog = -white_qsearching &#40;1 - notch, 0&#41;;
	      else if &#40;novel_deepness <= 7&#41;
		itog = -white_slide &#40;1 - notch, novel_deepness&#41;;
	      else
		itog = -white_all &#40;1 - notch, novel_deepness&#41;;

CUT node

Code: Select all

novel_deepness = deepness - 2 + stretches - BSR &#40;1 + sch&#41;;
	      if &#40;novel_deepness <= 1&#41;
		itog = -white_qsearching &#40;1 - notch, 0&#41;;
	      else if &#40;novel_deepness <= 7&#41;
		itog = -white_slide &#40;1 - notch, novel_deepness&#41;;
	      else
		itog = -white_cut &#40;1 - notch, novel_deepness&#41;;

Daniel Shawul · Post by **Daniel Shawul** » Thu Mar 29, 2012 9:18 pm

We never change the node type, so it would be around 50-50 by your estimate.

Well in that case using different formulae for CUT and ALL node could give different results (NOT that it is better). The strictly consecuative CUT-ALL-CUT-ALL classifcation that doesn't adjust its type is clearly inferior so pruning with that seems a lot more riskier. You should have far more ALL nodes than CUT nodes with a decent move ordering so I would say your node classification is inferior to begin with.

bob · Post by **bob** » Thu Mar 29, 2012 10:34 pm

lkaufman wrote:
bob wrote:
sedicla wrote:
bob wrote:
sedicla wrote:According to chess wiki programming is when after all candidate moves are tried, but fruit does it after first move is tried.

I thinking of trying doing things differently according node type, for example:
cut-node - use static null move (like stockfish) and don't do razoring (eval + margin < beta).
all-node - no static null move, try razor, no ordering for quiet moves...

What you guys think is a good time to switch a cut-node to all-node?

Thanks.
First, why does it matter? Are you somehow trying to type a node as PV, CUT or ALL, and then do things differently if it is CUT as opposed to ALL? A
"CUT" node becomes an ALL node when there is no cutoff to be found. Most common reason is poor move ordering. This is not the best move here although we thought it was, so another move will cause a cutoff later. On occasion, it is not ordering, but is a depth issue. Suddenly you see deep enough to see that the score is worse than you expect, no matter which
move you try, so none cause a cutoff...
Well, thats what im trying to figure it out, if it matters or not. I can see that is your opinion that it does not matter.
My idea is exactly that, do different things at all and cut nodes.
Here's the BIG question:

Why?

More commonly, people seem to be doing different things at PV vs non-PV nodes. I don't follow that reasoning either... the idea of the search is to find the best move. If you spend less effort on non-PV nodes, you will miss better moves. But for ALL vs CUT, the only advantage I see is the one I used in DTS, that is you want to split at an ALL node, never at a CUT node. Beyond that, I can't think of any reason why I might prune diffierently, reduce differently, or extend differently, just because a node was ALL rather than CUT or vice-versa...
Bob, I understand why you don't see the logic behind treating PV nodes differently than non-PV nodes. But it is quite obvious now that it does work. Every strong program makes this distinction now, and we would all drop a ton of elo points if we didn't. I'm sure Crafty would go up a lot if you made the distinction. I can't explain why it works, but it does, that's a fact. It's clearly a probabilistic thing; the odds of a reduced or pruned move being important are higher at PV nodes, so the rules should be different.
But the questioner asked about CUT vs. ALL nodes. Here too there could be probabilistic reasons why one type should be reduced or pruned more than the other, but in this case the empirical evidence is much less clear, and the top programs do not all agree. I believe that in the specific case of SINGULAR EXTENSIONS the distinction is considered important, even according to the Deep Blue originators of the idea. But then you don't believe in those either.
My opinion is that making the CUT/ALL distinction does have some value if you have the resources to detect small gains over many thousands of games. It's not worthwhile for most hobbyists I think.

Hsu didn't distinguish between PV and CUT nodes with regard to SE. He simply needed an efficient way to test for singularity. But if you look at his paper, in EITHER type of node, if you can prove that a move M is better than all other moves by a quantity S, then it gets extended 1 ply. The only thing that is done differently is the way the test is done, so that you don't do the test in positions where it will simply waste time. He found no way to define "singular" at an ALL node since everything is supposed to fail low. All you could do there is artificially lower the window by S, and see if one happens to fail high. An expensive proposition on ALL nodes.

In any case, I don't see his treating PV and non-PV nodes any differently in terms of what is done and what is not done, just in how the decision is made most efficiently.

The idea of the tree search is to find the best path with the least effort. We know that most programs change their mind at depth D+1 about once for every 6 searches done, based on a lot of empirical data presented by newborn, and then Heinz. That means that 5/6 of the time, if you spend less effort on the non-PV nodes, you win. But what if one of those non-PV nodes is really better. If you search it differently than the PV nodes, you can either (a) overlook something entirely if you are trying to spend less effort on the non-PV nodes; or (b) introduce a ridiculous search instability if you search PV nodes less rigorously than non-PV nodes. The non-PV node would fail high, but then fail low on the re-search since it is not being searched as a PV node with different rules. Which move would you play if this fail-high now fails low?

It may work. It may not work. It might be that many are simply copying the idea along with other good things found in robolito and friends, and the other things are what is really helping. I have not tried to go through robolito carefully as I don't consider that to be much fun...

BTW the robolito/rybka/etc "singular extensions" has absolutely nothing to do with Hsu and Campbell used. They are not even remotely similar in concept...

When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?

Re: When does a cut-node became an all-node?