Measuring/tuning search parameters

gladius · Post by **gladius** » Fri Jul 17, 2009 1:30 am

I'm sure this idea has been tried before, but a quick forum search didn't show anything. I've been using it for a bit with some decent success, so I figured I'd get some discussion started

.

The thought is to store a search tree from a pure alpha-beta search (no pruning - but using normal q-search) up to depth X (let's say 10) for 1000 different positions.

Then, run the search with the change you want to test over the positions, and measuring the resulting # of nodes and evaluation correctness (it's interesting to trade these off - obviously pure alpha beta has perf eval. correctness, but it doesn't make it nearly as deep).

Tuning parameters in the search is the same, except probably tune on a random subset, and measure against the entire set to avoid overfitting.

If you have a tree browser, you can do a diff. between the two searches, and see what effect the pruning actually had on the tree. This was actually the most useful part for me.

Clearly this won't be as good as running 10k games, but it should get you in the right ballpark for parameters, and give a good indicator of how well your search is working.

Dann Corbit · Post by **Dann Corbit** » Fri Jul 17, 2009 2:45 am

gladius wrote:I'm sure this idea has been tried before, but a quick forum search didn't show anything. I've been using it for a bit with some decent success, so I figured I'd get some discussion started .

The thought is to store a search tree from a pure alpha-beta search (no pruning - but using normal q-search) up to depth X (let's say 10) for 1000 different positions.

Then, run the search with the change you want to test over the positions, and measuring the resulting # of nodes and evaluation correctness (it's interesting to trade these off - obviously pure alpha beta has perf eval. correctness, but it doesn't make it nearly as deep).

Tuning parameters in the search is the same, except probably tune on a random subset, and measure against the entire set to avoid overfitting.

If you have a tree browser, you can do a diff. between the two searches, and see what effect the pruning actually had on the tree. This was actually the most useful part for me.

Clearly this won't be as good as running 10k games, but it should get you in the right ballpark for parameters, and give a good indicator of how well your search is working.

It's a good idea but...
With a zillion nodes and things getting pruned, how will you know whether pruned stuff was bad or good?

IOW, you have a 10 ply search. Your pruned search goes 10 plies. You get the same answer or a different answer. Is it better or worse. The stuff that got thrown away : what was wrong with it? THe stuff that got saved : what was good about it?

The real difficulty (as I see it) is figuring out the quality of the pruning decisions.

Can you share the code you are using to do this? I would like to experiment with it. I have tried similar ideas (one such idea is to fit a parabola through a series of experiments with different pruning / eval /search terms and then choose the maxima of the parabola as a starting point for the next iteration)

gladius · Post by **gladius** » Fri Jul 17, 2009 3:51 am

Dann Corbit wrote:It's a good idea but...
With a zillion nodes and things getting pruned, how will you know whether pruned stuff was bad or good?

IOW, you have a 10 ply search. Your pruned search goes 10 plies. You get the same answer or a different answer. Is it better or worse. The stuff that got thrown away : what was wrong with it? THe stuff that got saved : what was good about it?

That's the part where measuring the difference in the evaluation of the position versus the # of nodes starts to get really interesting. For example, null move pruning makes a huge difference in # of nodes searched, but average evaluation difference (that made it back to the root) was tiny on my test positions. So, that's an easy change to take

.

Dann Corbit wrote:The real difficulty (as I see it) is figuring out the quality of the pruning decisions.

Can you share the code you are using to do this? I would like to experiment with it. I have tried similar ideas (one such idea is to fit a parabola through a series of experiments with different pruning / eval /search terms and then choose the maxima of the parabola as a starting point for the next iteration)

The code is currently ultra hacked into my engine, so unfortunately not super shareable. I didn't use it to do too much automated tuning though, mostly I ran tests with different weights for parameters, and looked through the results manually picking out the best ones to keep trying. Not the most efficient, but it worked

.

bob · Post by **bob** » Fri Jul 17, 2009 5:19 am

gladius wrote:
Dann Corbit wrote:It's a good idea but...
With a zillion nodes and things getting pruned, how will you know whether pruned stuff was bad or good?

IOW, you have a 10 ply search. Your pruned search goes 10 plies. You get the same answer or a different answer. Is it better or worse. The stuff that got thrown away : what was wrong with it? THe stuff that got saved : what was good about it?
That's the part where measuring the difference in the evaluation of the position versus the # of nodes starts to get really interesting. For example, null move pruning makes a huge difference in # of nodes searched, but average evaluation difference (that made it back to the root) was tiny on my test positions. So, that's an easy change to take .

Dann Corbit wrote:The real difficulty (as I see it) is figuring out the quality of the pruning decisions.

Can you share the code you are using to do this? I would like to experiment with it. I have tried similar ideas (one such idea is to fit a parabola through a series of experiments with different pruning / eval /search terms and then choose the maxima of the parabola as a starting point for the next iteration)
The code is currently ultra hacked into my engine, so unfortunately not super shareable. I didn't use it to do too much automated tuning though, mostly I ran tests with different weights for parameters, and looked through the results manually picking out the best ones to keep trying. Not the most efficient, but it worked .

If you can pull this off, it will be a world-beater idea. But the issue is the comparison process. Which is so far beyond non-trivial that...

This has been done in the past. More than once. And so far, it really has not worked at all. But if the comparison procedure compared more than just the final move, or the final score, it might really be worth something.

Measuring/tuning search parameters

Measuring/tuning search parameters

Re: Measuring/tuning search parameters

Re: Measuring/tuning search parameters

Re: Measuring/tuning search parameters