Expected gain from delta pruning

marcelk · Post by **marcelk** » Sat Jul 16, 2011 12:56 pm

Some time ago I promised to post results on two-sided delta pruning (or inverse delta pruning as I called it then).

Before doing so I first wanted to quantify in the ELO metric the contribution of regular delta pruning. I didn't have this information yet because in general I don't use game playing to verify search changes but test sets. So I set up a match, 6000 opening positions (my own set) played from both sides, 120+1 time control, single core and no pondering. The result is less than spectacular:

Code: Select all

Rank Name          Elo    +    - games score oppo. draws 
   1 rookie3.3       3    5    5 12000   51%    -3   45% <-- delta pruning
   2 rookie3.3.1    -3    5    5 12000   49%     3   45% <-- no delta pruning

With such a small difference, even after magnification through the self-play lens, it doesn't make sense to me to implement, tune and qualify the two-sided delta pruning idea at all, because I expect even smaller gains from that and I have much bigger things to work on first.

But before I shelve the idea I would like to know if this is indeed what I can expect from delta pruning, or if others have observed a larger difference. (There is one implementation detail related to lazy evaluation that might be a troublemaker in my case.)

Any feedback is appreciated.

bob · Post by **bob** » Sat Jul 16, 2011 5:25 pm

marcelk wrote:Some time ago I promised to post results on two-sided delta pruning (or inverse delta pruning as I called it then).

Before doing so I first wanted to quantify in the ELO metric the contribution of regular delta pruning. I didn't have this information yet because in general I don't use game playing to verify search changes but test sets. So I set up a match, 6000 opening positions (my own set) played from both sides, 120+1 time control, single core and no pondering. The result is less than spectacular:
Code: Select all
Rank Name          Elo    +    - games score oppo. draws 
   1 rookie3.3       3    5    5 12000   51%    -3   45% <-- delta pruning
   2 rookie3.3.1    -3    5    5 12000   49%     3   45% <-- no delta pruning
With such a small difference, even after magnification through the self-play lens, it doesn't make sense to me to implement, tune and qualify the two-sided delta pruning idea at all, because I expect even smaller gains from that and I have much bigger things to work on first.

But before I shelve the idea I would like to know if this is indeed what I can expect from delta pruning, or if others have observed a larger difference. (There is one implementation detail related to lazy evaluation that might be a troublemaker in my case.)

Any feedback is appreciated.

If you feel your testing approach is valid, then you don't have any convincing evidence to say the idea is good or bad. We see this countless times in our testing. We've developed a strong trust of our results, and react accordingly. With that result, either drive the error bar down to +/- 1, which I do on occasion by pushing the games up to 100K or so, or else use the KISS idea and don't add code that has no apparent gain...

marcelk · Post by **marcelk** » Sat Jul 16, 2011 5:42 pm

May I should have explained my point better.

These are the results of single-sided delta pruning (similar to how it used to be done in Crafty), vs. no delta pruning. It tells me the difference between these two is essentially worthless, and longer testing at this point in the development is not the most economical thing to do.

However, I was expecting a much larger contribution for single-sided delta pruning, say 10 elo or more. I agree with you, I'm not seeing an effect.

Now back to the plan: I ran this test run only as a prelude for attacking the original question: to have a baseline before trying to add double-sided delta pruning. I'm now inclined to not pursue that idea at all anymore. Unless someone can convince me that one-sided delta pruning by itself is helpful in his program and that my implementation must be to blame, not the concept.

michiguel · Post by **michiguel** » Sat Jul 16, 2011 6:28 pm

marcelk wrote:May I should have explained my point better.

These are the results of single-sided delta pruning (similar to how it used to be done in Crafty), vs. no delta pruning. It tells me the difference between these two is essentially worthless, and longer testing at this point in the development is not the most economical thing to do.

However, I was expecting a much larger contribution for single-sided delta pruning, say 10 elo or more. I agree with you, I'm not seeing an effect.

Now back to the plan: I ran this test run only as a prelude for attacking the original question: to have a baseline before trying to add double-sided delta pruning. I'm now inclined to not pursue that idea at all anymore. Unless someone can convince me that one-sided delta pruning by itself is helpful in his program and that my implementation must be to blame, not the concept.

I do not see this any different than futility, and I needed to play with the margins to find a significant gain. I do not think i tested dp separately from futility, though.

Miguel

bob · Post by **bob** » Sat Jul 16, 2011 10:03 pm

michiguel wrote:
marcelk wrote:May I should have explained my point better.

These are the results of single-sided delta pruning (similar to how it used to be done in Crafty), vs. no delta pruning. It tells me the difference between these two is essentially worthless, and longer testing at this point in the development is not the most economical thing to do.

However, I was expecting a much larger contribution for single-sided delta pruning, say 10 elo or more. I agree with you, I'm not seeing an effect.

Now back to the plan: I ran this test run only as a prelude for attacking the original question: to have a baseline before trying to add double-sided delta pruning. I'm now inclined to not pursue that idea at all anymore. Unless someone can convince me that one-sided delta pruning by itself is helpful in his program and that my implementation must be to blame, not the concept.
I do not see this any different than futility, and I needed to play with the margins to find a significant gain. I do not think i tested dp separately from futility, though.

Miguel

The original "delta pruning" (and now I wish I had chosen a different variable name than delta so that this would be called something else) was nothing more than an optimization. I can try a capture, flip sides, and if the quick call to Eval at the next ply causes a cutoff, good. Or I can check to see if it appears we are so far behind that this capture can't get us back close enough to alpha that the positional eval might get us "over the hump". The trees searched are subtly different, because capturing a pawn when you are down a queen, even if the pawn is free, is not going to help. But I think the thing is a very small optimization overall. If you set the window too wide, you never do it and may as well throw it out anyway. If you set it too narrow, you introduce excessive error. But no matter what you set it at, you are going to see a loss from "perfection". Either you try some captures that are pointless, or you prune away some that are worth searching. You get a speed gain, and an accuracy loss. And the tuning job is to maximize one and minimize the other.

I have it on my to-do list to go back and look at this again when I have time, as some things were dumped when the mvv/lva move ordering was added. I wrote it in a very quick way, without the "delta pruning", and it was better anyway. So I kept the new code but did not try to re-optimize with the delta stuff. I will before long...

I'll report numbers and whether or not I keep the changes when I get to it.

Expected gain from delta pruning

Expected gain from delta pruning

Re: Expected gain from delta pruning

Re: Expected gain from delta pruning

Re: Expected gain from delta pruning

Re: Expected gain from delta pruning