scaling property of hacks below a fixed depth

lucasart · Post by **lucasart** » Thu Aug 08, 2013 4:56 pm

Such hacks include:
* razoring
* eval pruning or futility pruning (for me it's the same, basically eval pruning is reversed futility pruning)
* move count pruning

I wonder if any research has been done to measure the scaling property of such techniques.

Let's take an example of feature X that does some speculative pruning when depth <= 5 (+other conditions). If b is the branching factor without X, then the time to depth goes from

Code: Select all

T(d) ~ b^d

to

Code: Select all

T'(d) ~ b'^5 * (b-5)^n

for some b'<b (b' branching factor of the 5 depth trees where X is applied).

This is of course a really sketchy and simplified equation, but it gives an idea. When n goes to infinity, we have

Code: Select all

T'(n)/T(n) ~ (b'/b)^5

some asymptotically we get a constant speedup. That is for the time to depth, which is the reward.

Now, like most things, it's a risk-reward: there is a cost in neglecting some tactics. If this cost is constant, increasing, or decreasing with depth, we are left with a different conclusion as to the asymptotical risk-reward of feature X.

Experiment: I am running a test in DiscoCheck of removing Razoring and Eval pruning, at different time controls:
* 5000 games in 2.5"+0.025"
* 5000 games in 5"+0.05"
* 5000 games in 10"+0.1"
(I don't have the computational ressources to go much further in a reasonable amount of time).

I really hope to see that these featurea are useless at long TC and I can remove them. I may never have the CPU ressources to reach a time control where that becomes apparent though, but curious to see how the scaling graph at least begins.

jdart · Post by **jdart** » Thu Aug 08, 2013 5:28 pm

Why would you expect these "hacks" to be useless at long TC?

Most of the nodes you visit are always in the q-search or the few plies before the q-search. And many pruning tricks including futility only operate there. Even at large root depths, if you are semi-safely pruning away a lot of the deep nodes near the q-search you will have see a gain from that, I think.

--Jon

lucasart · Post by **lucasart** » Thu Aug 08, 2013 5:30 pm

I forgot to say:
* a constant speed up presents diminishing returns (in terms of ELO)
* so depending on how the ELO cost at fixed depth of feature X scales, we get a different conclusion.

Results so far:
* 5000 games in 2.5"+0.025": -45.7 elo
* 5000 games in 5"+0.05" (running)
* 5000 games in 10"+0.1"

jdart · Post by **jdart** » Thu Aug 08, 2013 5:36 pm

But I expect you are not getting a constant speedup.

You are probably getting a more or less fixed percentage of nodes pruned. More depth = more nodes but if you prune the same percentage you get the same advantage from the pruning technique. Just IMO and of course you still should measure.

--Jon

lucasart · Post by **lucasart** » Thu Aug 08, 2013 5:45 pm

jdart wrote:But I expect you are not getting a constant speedup.

You are probably getting a more or less fixed percentage of nodes pruned. More depth = more nodes but if you prune the same percentage you get the same advantage from the pruning technique. Just IMO and of course you still should measure.

--Jon

That's exactly what I'm saying: constant percentage of nodes pruned equals constant speedup (NPS equal). And if NPS is improved to to feature X removed, again it's by a constant factor. Anyway, I'm waiting for the test to finish, hopefully tomorrow.

lucasart · Post by **lucasart** » Fri Aug 09, 2013 6:43 am

Results:
* 5000 games in 2.5"+0.025": -46 elo
* 5000 games in 5"+0.05": -50 elo
* 5000 games in 10"+0.1": -61 elo

So it scales well, at least for the time controls I was able to test, contrary to my initial intuition.

This confirms (indirectly) that the ELO cost of tactical mistakes (ELO cost at fixed depth or razoring and eval pruning) diminishes when the time control increases. So at long TC, you still get the big speedup and the cost of neglected tactics becomes smaller and smaller.

This probably explains why Stockfish scales so well (it is extremely aggressive with speculative pruning technique, which has a cost at very fast tc where SF is not so strong, but pays at long tc).

Don · Post by **Don** » Mon Aug 12, 2013 2:06 pm

jdart wrote:Why would you expect these "hacks" to be useless at long TC?

Most of the nodes you visit are always in the q-search or the few plies before the q-search. And many pruning tricks including futility only operate there. Even at large root depths, if you are semi-safely pruning away a lot of the deep nodes near the q-search you will have see a gain from that, I think.

--Jon

Scalability is a fascinating topic. Every once in while there is a post where someone calls into question some well known technique and claims it is not scalable. I remember Vincent Diepeveen claiming all the modern programs were broken because of LMR. But the vast majority of things that are working work at all levels. Unless you have a good reason to believe otherwise it's best to assume that if it helps at one level it will help at another.

A trap is that as longer time controls the ratings compress. If you do a good test of scalability you cannot assume a loss of scalability just because the improvement is less at a higher depth. If you have a 20 ELO improvement at game in 5 seconds it may only be 10 or 15 at game in 15 seconds. That is not a scalability issue, that is just a side-effect on longer matches where draws become more and more common with depth.

lucasart · Post by **lucasart** » Mon Aug 12, 2013 2:28 pm

Don wrote: Scalability is a fascinating topic. Every once in while there is a post where someone calls into question some well known technique and claims it is not scalable. I remember Vincent Diepeveen claiming all the modern programs were broken because of LMR. But the vast majority of things that are working work at all levels. Unless you have a good reason to believe otherwise it's best to assume that if it helps at one level it will help at another.

Vincent claimed that LMR doesn't work. He also stated that razoring and the likes do not work at long TC. And all that with nothing more than his "intuition" to back up his claim. He also claimed that he invented everything in Diep, and everyone pillaged it (although it's never been open source). Clearly, he lives in coocoo-land, and I don't think anyone serious listens to his hand-waving anymore.

But I wanted to play the devil's advocate and put these razoring and futility pruning techniques into question. If I can remove something without inducing an ELO regression, I will gladly do it. Probably most people already knew the answer, but I just had to see for myself.

Don wrote: A trap is that as longer time controls the ratings compress. If you do a good test of scalability you cannot assume a loss of scalability just because the improvement is less at a higher depth. If you have a 20 ELO improvement at game in 5 seconds it may only be 10 or 15 at game in 15 seconds. That is not a scalability issue, that is just a side-effect on longer matches where draws become more and more common with depth.

Very true. And that fact, combined with my results tend to prove that razoring and futility pruning actually scale positively, if anything.

Also, it made me realize that the Stockfish team are wasting a lot of good patches, because of their ultra-conservative scaling test. They assume a worse result at long TC means the patch doesn't scale, which is a mistake as it neglects the compression effect.

Henk · Post by **Henk** » Mon Aug 12, 2013 2:32 pm

lucasart wrote:Such hacks include:
* razoring
* eval pruning or futility pruning (for me it's the same, basically eval pruning is reversed futility pruning)
* move count pruning

I wonder if any research has been done to measure the scaling property of such techniques.

Let's take an example of feature X that does some speculative pruning when depth <= 5 (+other conditions). If b is the branching factor without X, then the time to depth goes from
Code: Select all
T(d) ~ b^d
to
Code: Select all
T'(d) ~ b'^5 * (b-5)^n
for some b'<b (b' branching factor of the 5 depth trees where X is applied).

This is of course a really sketchy and simplified equation, but it gives an idea. When n goes to infinity, we have
Code: Select all
T'(n)/T(n) ~ (b'/b)^5
some asymptotically we get a constant speedup. That is for the time to depth, which is the reward.

Now, like most things, it's a risk-reward: there is a cost in neglecting some tactics. If this cost is constant, increasing, or decreasing with depth, we are left with a different conclusion as to the asymptotical risk-reward of feature X.

Experiment: I am running a test in DiscoCheck of removing Razoring and Eval pruning, at different time controls:
* 5000 games in 2.5"+0.025"
* 5000 games in 5"+0.05"
* 5000 games in 10"+0.1"
(I don't have the computational ressources to go much further in a reasonable amount of time).

I really hope to see that these featurea are useless at long TC and I can remove them. I may never have the CPU ressources to reach a time control where that becomes apparent though, but curious to see how the scaling graph at least begins.

Can anyone tell me what is move count pruning ?

My futility pruning does not give much, if anything:

if (depth <= 200 && i > 0 && !(move is Promotion))
{
int margin = 0;
if (depth <= 100)
margin = capture != null ? capture.Points + 30000 : 30000;
else margin = capture != null ? capture.Points + 50000 : 50000;
if (pos.MaterialCount * pos.CurPlayer + margin <= lb)
{
continue;
}
}

Don · Post by **Don** » Mon Aug 12, 2013 2:53 pm

lucasart wrote:
Don wrote: Scalability is a fascinating topic. Every once in while there is a post where someone calls into question some well known technique and claims it is not scalable. I remember Vincent Diepeveen claiming all the modern programs were broken because of LMR. But the vast majority of things that are working work at all levels. Unless you have a good reason to believe otherwise it's best to assume that if it helps at one level it will help at another.
Vincent claimed that LMR doesn't work. He also stated that razoring and the likes do not work at long TC. And all that with nothing more than his "intuition" to back up his claim. He also claimed that he invented everything in Diep, and everyone pillaged it (although it's never been open source). Clearly, he lives in coocoo-land, and I don't think anyone serious listens to his hand-waving anymore.

But I wanted to play the devil's advocate and put these razoring and futility pruning techniques into question. If I can remove something without inducing an ELO regression, I will gladly do it. Probably most people already knew the answer, but I just had to see for myself.

Don wrote: A trap is that as longer time controls the ratings compress. If you do a good test of scalability you cannot assume a loss of scalability just because the improvement is less at a higher depth. If you have a 20 ELO improvement at game in 5 seconds it may only be 10 or 15 at game in 15 seconds. That is not a scalability issue, that is just a side-effect on longer matches where draws become more and more common with depth.
Very true. And that fact, combined with my results tend to prove that razoring and futility pruning actually scale positively, if anything.

Also, it made me realize that the Stockfish team are wasting a lot of good patches, because of their ultra-conservative scaling test. They assume a worse result at long TC means the patch doesn't scale, which is a mistake as it neglects the compression effect.

As I said, scaling is a fascinating subject and one that interests me. We are still trying to figure out what the general principles of scaling are.

scaling property of hacks below a fixed depth

scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth

Re: scaling property of hacks below a fixed depth