Diminishing returns of increasing search depth

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Diminishing returns of increasing search depth

Post by hgm »

I would suggest to also play N vs N+2 and N vs N+3 versions: if the gain is about 100 Elo per ply, playing engines with ~300 Elo difference gives the most important contribution to the reliability of the over-all scale. Playing only N vs N+1 makes the chain connecting the top and the bottom engine very long, with a disastrous effect on error acumulation.

I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

Re: Diminishing returns of increasing search depth

Post by rvida »

hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)
User avatar
Mike S.
Posts: 1480
Joined: Thu Mar 09, 2006 5:33 am

Re: Diminishing returns of increasing search depth

Post by Mike S. »

Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.

But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
Regards, Mike
User avatar
Eelco de Groot
Posts: 4565
Joined: Sun Mar 12, 2006 2:40 am
Full name:   

Re: Diminishing returns of increasing search depth

Post by Eelco de Groot »

rvida wrote:
hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)
Mike S. wrote:Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.

But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
That is true but I think that both these case are examples of what Harm-Geert is saying about a systematic offset. Maybe a more pure experiment would be to just do a T vs. 2T experiment, with a double allotment of time for the whole game, with engine of choice. It would introduce another factor, how good is the timemanagement compared to your plydepths, but it would eliminate some of the plydepth reporting. You could vary T over as many points as you would like to get more datapoints and a better idea of the "jaggedness", I have no idea what the mathematically correct term here would be, the shape of the slope of the curve probably does not have the same derivatives locally everywhere. This tells you at what timecontrols the engine could probably improve Late Move Reductions in whatever form this is implemented, in combination with the timing algorithm; if too much time is spent on moves late in the list, you may finish N plies in Time per Move but it would probably be better to get one or two moves in of ply N+1 to see if there is a change and if yes take a bit more time.

Regards, Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
User avatar
hgm
Posts: 27796
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Diminishing returns of increasing search depth

Post by hgm »

I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
User avatar
towforce
Posts: 11575
Joined: Thu Mar 09, 2006 12:57 am
Location: Birmingham UK

Re: Diminishing returns of increasing search depth

Post by towforce »

hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Diminishing returns of increasing search depth

Post by bob »

Eelco de Groot wrote:
rvida wrote:
hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)
Mike S. wrote:Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.

But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
That is true but I think that both these case are examples of what Harm-Geert is saying about a systematic offset. Maybe a more pure experiment would be to just do a T vs. 2T experiment, with a double allotment of time for the whole game, with engine of choice. It would introduce another factor, how good is the timemanagement compared to your plydepths, but it would eliminate some of the plydepth reporting. You could vary T over as many points as you would like to get more datapoints and a better idea of the "jaggedness", I have no idea what the mathematically correct term here would be, the shape of the slope of the curve probably does not have the same derivatives locally everywhere. This tells you at what timecontrols the engine could probably improve Late Move Reductions in whatever form this is implemented, in combination with the timing algorithm; if too much time is spent on moves late in the list, you may finish N plies in Time per Move but it would probably be better to get one or two moves in of ply N+1 to see if there is a change and if yes take a bit more time.

Regards, Eelco
Actually a good point. I see EBF values of way under 2.0 in endgames, which means one ply is not the same in endgames as it is in middlegames. I like the 2x faster approach, which simply gives one program 2x the time of the other which is then uniform throughout the test.

The "gain of another ply" is pretty misleading today when the branching factor goes way down in endgames due to reductions and pruning... It is possible that one more ply won't search much that is new, but then it will only take a little longer than the previous ply did to complete also...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Diminishing returns of increasing search depth

Post by bob »

towforce wrote:
hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.
The problem, in its base form, is "not all plies are created equal."

One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Diminishing returns of increasing search depth

Post by Laskos »

bob wrote:
towforce wrote:
hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.
The problem, in its base form, is "not all plies are created equal."

One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
Bob, can you make a test with the doubling of time, games played 1*t with 2*t, 2*t with 4*t, and so on, from a game in 0.1:0.2 seconds to a game in 51.2:102.4 seconds? That would give 10 results, and if it would be with +/- 5 Elo points error 2 sigma, would be excellent. Then I could fit the points easily. I don't think that HGM idea of playing also 1*t with 16*t, for example, is productive, as we do not need the cumulative score, only the differences from the doubling of time. Also, the error bars in Elo points would become large for very skewed 99:1 matches. A round robin with different engines would take too much time, I guess, so do it just with Crafty.

Thanks,
Kai
Uri Blass
Posts: 10282
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Diminishing returns of increasing search depth

Post by Uri Blass »

Laskos wrote:
bob wrote:
towforce wrote:
hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.
The problem, in its base form, is "not all plies are created equal."

One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
Bob, can you make a test with the doubling of time, games played 1*t with 2*t, 2*t with 4*t, and so on, from a game in 0.1:0.2 seconds to a game in 51.2:102.4 seconds? That would give 10 results, and if it would be with +/- 5 Elo points error 2 sigma, would be excellent. Then I could fit the points easily. I don't think that HGM idea of playing also 1*t with 16*t, for example, is productive, as we do not need the cumulative score, only the differences from the doubling of time. Also, the error bars in Elo points would become large for very skewed 99:1 matches. A round robin with different engines would take too much time, I guess, so do it just with Crafty.

Thanks,
Kai
I do not agree that the other data is not useful.

rating may be dependent on the opponents and it may be interesting if 16*t against 1*t perform as expected and if you get the same rating difference if you play with different engines.

In any case you need to start from some point so playing
Crafty X(mean X seconds per move) against Crafty 2X may be useful.

later you can see if the rating is
changed if you add more games like CraftyX against Crafty 4X or CraftyX against Stockfish X and Crafty 2X against stockfish X.

Uri