Diminishing returns of increasing search depth

hgm · Post by **hgm** » Thu Jul 30, 2009 10:26 am

I would suggest to also play N vs N+2 and N vs N+3 versions: if the gain is about 100 Elo per ply, playing engines with ~300 Elo difference gives the most important contribution to the reliability of the over-all scale. Playing only N vs N+1 makes the chain connecting the top and the bottom engine very long, with a disastrous effect on error acumulation.

I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)

rvida · Post by **rvida** » Thu Jul 30, 2009 12:02 pm

hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)

Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)

Mike S. · Post by **Mike S.** » Thu Jul 30, 2009 12:15 pm

Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.

But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.

Eelco de Groot · Post by **Eelco de Groot** » Thu Jul 30, 2009 2:52 pm

rvida wrote:
hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)

Mike S. wrote:Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.

But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.

That is true but I think that both these case are examples of what Harm-Geert is saying about a systematic offset. Maybe a more pure experiment would be to just do a T vs. 2T experiment, with a double allotment of time for the whole game, with engine of choice. It would introduce another factor, how good is the timemanagement compared to your plydepths, but it would eliminate some of the plydepth reporting. You could vary T over as many points as you would like to get more datapoints and a better idea of the "jaggedness", I have no idea what the mathematically correct term here would be, the shape of the slope of the curve probably does not have the same derivatives locally everywhere. This tells you at what timecontrols the engine could probably improve Late Move Reductions in whatever form this is implemented, in combination with the timing algorithm; if too much time is spent on moves late in the list, you may finish N plies in Time per Move but it would probably be better to get one or two moves in of ply N+1 to see if there is a change and if yes take a bit more time.

Regards, Eelco

hgm · Post by **hgm** » Thu Jul 30, 2009 10:14 pm

I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.

towforce · Post by **towforce** » Thu Jul 30, 2009 11:46 pm

hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.

I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.

bob · Post by **bob** » Fri Jul 31, 2009 12:27 am

Eelco de Groot wrote:
rvida wrote:
hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.

Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)

Mike S. wrote:Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.

But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
That is true but I think that both these case are examples of what Harm-Geert is saying about a systematic offset. Maybe a more pure experiment would be to just do a T vs. 2T experiment, with a double allotment of time for the whole game, with engine of choice. It would introduce another factor, how good is the timemanagement compared to your plydepths, but it would eliminate some of the plydepth reporting. You could vary T over as many points as you would like to get more datapoints and a better idea of the "jaggedness", I have no idea what the mathematically correct term here would be, the shape of the slope of the curve probably does not have the same derivatives locally everywhere. This tells you at what timecontrols the engine could probably improve Late Move Reductions in whatever form this is implemented, in combination with the timing algorithm; if too much time is spent on moves late in the list, you may finish N plies in Time per Move but it would probably be better to get one or two moves in of ply N+1 to see if there is a change and if yes take a bit more time.

Regards, Eelco

Actually a good point. I see EBF values of way under 2.0 in endgames, which means one ply is not the same in endgames as it is in middlegames. I like the 2x faster approach, which simply gives one program 2x the time of the other which is then uniform throughout the test.

The "gain of another ply" is pretty misleading today when the branching factor goes way down in endgames due to reductions and pruning... It is possible that one more ply won't search much that is new, but then it will only take a little longer than the previous ply did to complete also...

bob · Post by **bob** » Fri Jul 31, 2009 12:30 am

towforce wrote:
hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.

The problem, in its base form, is "not all plies are created equal."

One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.

Laskos · Post by **Laskos** » Sat Aug 01, 2009 3:03 am

bob wrote:
towforce wrote:
hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.
The problem, in its base form, is "not all plies are created equal."

One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.

Bob, can you make a test with the doubling of time, games played 1*t with 2*t, 2*t with 4*t, and so on, from a game in 0.1:0.2 seconds to a game in 51.2:102.4 seconds? That would give 10 results, and if it would be with +/- 5 Elo points error 2 sigma, would be excellent. Then I could fit the points easily. I don't think that HGM idea of playing also 1*t with 16*t, for example, is productive, as we do not need the cumulative score, only the differences from the doubling of time. Also, the error bars in Elo points would become large for very skewed 99:1 matches. A round robin with different engines would take too much time, I guess, so do it just with Crafty.

Thanks,
Kai

Uri Blass · Post by **Uri Blass** » Sat Aug 01, 2009 6:23 am

Laskos wrote:
bob wrote:
towforce wrote:
hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.
The problem, in its base form, is "not all plies are created equal."

One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
Bob, can you make a test with the doubling of time, games played 1*t with 2*t, 2*t with 4*t, and so on, from a game in 0.1:0.2 seconds to a game in 51.2:102.4 seconds? That would give 10 results, and if it would be with +/- 5 Elo points error 2 sigma, would be excellent. Then I could fit the points easily. I don't think that HGM idea of playing also 1*t with 16*t, for example, is productive, as we do not need the cumulative score, only the differences from the doubling of time. Also, the error bars in Elo points would become large for very skewed 99:1 matches. A round robin with different engines would take too much time, I guess, so do it just with Crafty.

Thanks,
Kai

I do not agree that the other data is not useful.

rating may be dependent on the opponents and it may be interesting if 16*t against 1*t perform as expected and if you get the same rating difference if you play with different engines.

In any case you need to start from some point so playing
Crafty X(mean X seconds per move) against Crafty 2X may be useful.

later you can see if the rating is
changed if you add more games like CraftyX against Crafty 4X or CraftyX against Stockfish X and Crafty 2X against stockfish X.

Uri

Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth

Re: Diminishing returns of increasing search depth