I would suggest to also play N vs N+2 and N vs N+3 versions: if the gain is about 100 Elo per ply, playing engines with ~300 Elo difference gives the most important contribution to the reliability of the over-all scale. Playing only N vs N+1 makes the chain connecting the top and the bottom engine very long, with a disastrous effect on error acumulation.
I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.
Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
Diminishing returns of increasing search depth
Moderators: hgm, Rebel, chrisw
-
- Posts: 27822
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
-
- Posts: 481
- Joined: Thu Apr 16, 2009 12:00 pm
- Location: Slovakia, EU
Re: Diminishing returns of increasing search depth
Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.
Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
-
- Posts: 1480
- Joined: Thu Mar 09, 2006 5:33 am
Re: Diminishing returns of increasing search depth
Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.
But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
Regards, Mike
-
- Posts: 4567
- Joined: Sun Mar 12, 2006 2:40 am
- Full name:
Re: Diminishing returns of increasing search depth
rvida wrote:Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.
Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)
That is true but I think that both these case are examples of what Harm-Geert is saying about a systematic offset. Maybe a more pure experiment would be to just do a T vs. 2T experiment, with a double allotment of time for the whole game, with engine of choice. It would introduce another factor, how good is the timemanagement compared to your plydepths, but it would eliminate some of the plydepth reporting. You could vary T over as many points as you would like to get more datapoints and a better idea of the "jaggedness", I have no idea what the mathematically correct term here would be, the shape of the slope of the curve probably does not have the same derivatives locally everywhere. This tells you at what timecontrols the engine could probably improve Late Move Reductions in whatever form this is implemented, in combination with the timing algorithm; if too much time is spent on moves late in the list, you may finish N plies in Time per Move but it would probably be better to get one or two moves in of ply N+1 to see if there is a change and if yes take a bit more time.Mike S. wrote:Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.
But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
Regards, Eelco
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
-
- Posts: 27822
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Diminishing returns of increasing search depth
I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
-
- Posts: 11590
- Joined: Thu Mar 09, 2006 12:57 am
- Location: Birmingham UK
Re: Diminishing returns of increasing search depth
I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
Writing is the antidote to confusion.
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
It's not "how smart you are", it's "how are you smart".
Your brain doesn't work the way you want, so train it!
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Diminishing returns of increasing search depth
Actually a good point. I see EBF values of way under 2.0 in endgames, which means one ply is not the same in endgames as it is in middlegames. I like the 2x faster approach, which simply gives one program 2x the time of the other which is then uniform throughout the test.Eelco de Groot wrote:rvida wrote:Remember that nominal ply depth has different meaning for different engines (due to different pruning / reductions / extensions)hgm wrote: I would furthermore advise to do this not with a single engine, but select a group of about 6 engines that is about equally strong when limited to 12 ply. And then just play a number of round-robins with them inclucing 4 'generations' (N, N+1, N+2, N+3) for every N that is feasible.
Perhaps a small pilot study is in order to do this with a larger number of engines, limited to 10, 11, 12, and 13 ply, to see which are suitable and if any of them would have to be systematically offset. (e.g. an N-ply Rybka might be as strong, and use about as much time as an N+2-ply other engine.)That is true but I think that both these case are examples of what Harm-Geert is saying about a systematic offset. Maybe a more pure experiment would be to just do a T vs. 2T experiment, with a double allotment of time for the whole game, with engine of choice. It would introduce another factor, how good is the timemanagement compared to your plydepths, but it would eliminate some of the plydepth reporting. You could vary T over as many points as you would like to get more datapoints and a better idea of the "jaggedness", I have no idea what the mathematically correct term here would be, the shape of the slope of the curve probably does not have the same derivatives locally everywhere. This tells you at what timecontrols the engine could probably improve Late Move Reductions in whatever form this is implemented, in combination with the timing algorithm; if too much time is spent on moves late in the list, you may finish N plies in Time per Move but it would probably be better to get one or two moves in of ply N+1 to see if there is a change and if yes take a bit more time.Mike S. wrote:Also, we are obviously facing the problem that (at least one) engine is not reporting the true internal nominal depth, but something else.
But that may not be such a big problem in this context, as performance comparisons will always be "depth x to depth y", per engine.
Regards, Eelco
The "gain of another ply" is pretty misleading today when the branching factor goes way down in endgames due to reductions and pruning... It is possible that one more ply won't search much that is new, but then it will only take a little longer than the previous ply did to complete also...
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Diminishing returns of increasing search depth
The problem, in its base form, is "not all plies are created equal."towforce wrote:I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Diminishing returns of increasing search depth
Bob, can you make a test with the doubling of time, games played 1*t with 2*t, 2*t with 4*t, and so on, from a game in 0.1:0.2 seconds to a game in 51.2:102.4 seconds? That would give 10 results, and if it would be with +/- 5 Elo points error 2 sigma, would be excellent. Then I could fit the points easily. I don't think that HGM idea of playing also 1*t with 16*t, for example, is productive, as we do not need the cumulative score, only the differences from the doubling of time. Also, the error bars in Elo points would become large for very skewed 99:1 matches. A round robin with different engines would take too much time, I guess, so do it just with Crafty.bob wrote:The problem, in its base form, is "not all plies are created equal."towforce wrote:I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
Thanks,
Kai
-
- Posts: 10316
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Diminishing returns of increasing search depth
I do not agree that the other data is not useful.Laskos wrote:Bob, can you make a test with the doubling of time, games played 1*t with 2*t, 2*t with 4*t, and so on, from a game in 0.1:0.2 seconds to a game in 51.2:102.4 seconds? That would give 10 results, and if it would be with +/- 5 Elo points error 2 sigma, would be excellent. Then I could fit the points easily. I don't think that HGM idea of playing also 1*t with 16*t, for example, is productive, as we do not need the cumulative score, only the differences from the doubling of time. Also, the error bars in Elo points would become large for very skewed 99:1 matches. A round robin with different engines would take too much time, I guess, so do it just with Crafty.bob wrote:The problem, in its base form, is "not all plies are created equal."towforce wrote:I thought that the object of the exercise was for a single program to play itself. After all, we wish to isolate the effect of one thing, and one thing only - an extra ply of search depth.hgm wrote:I agree: time-odds would be better than fixed ply-depth, because at fixed ply depth most programs will start to play like an idiot in the end-game, and different amont of end-game knowledge might to a lare extent start to mask the effect of the search depth.
One ply in the middlegame might be significant, while one ply in the endgame might be almost meaningless. Two different programs can use two different approaches to reductions, extensions and pruning, and see a different gain from an extra ply. Chessmaster, for example, will likely see a huge advantage as the depth increases, but each additional ply will take a huge amount of time longer due to the way his search works.
Thanks,
Kai
rating may be dependent on the opponents and it may be interesting if 16*t against 1*t perform as expected and if you get the same rating difference if you play with different engines.
In any case you need to start from some point so playing
Crafty X(mean X seconds per move) against Crafty 2X may be useful.
later you can see if the rating is
changed if you add more games like CraftyX against Crafty 4X or CraftyX against Stockfish X and Crafty 2X against stockfish X.
Uri