optimal aspiration window for stockfish question

bob · Post by **bob** » Thu Mar 15, 2012 9:11 pm

While I have tests running, here is some more info about my opinion of how this stuff should work.

Point 1. You want to (optimally) do exactly one re-search to resolve any fail-high or fail-low. More than one is inefficient and causes you to search a tree that you would normally not traverse. In the last iteration or two this can be expensive.

Point 2. In doing that one re-search, you don't want to increase the search window beyond what is necessary to capture the true score. Why? take a case where you are going to win a positional advantage of +.3 over what you saw at the previous iteration, and your aspiration is +/- old score + .2... You fail high. Where you set the beta bound can be important. For example, if you don't get it to +.3, you will fail high again, which is a waste. If you set it to +.31, that would be the perfect bound because you would not fail high, nor would you have to deal with any paths in the PV search that produce a score that is more than 0.01 better than what we know is the real score. The farther you advance beta, the more "crap" works its way into the search and the tree grows. Hence the potential advantage of a null-window search we often mention.

So, we have a conundrum. We want to adjust the bound just enough to "catch" the true score, without going too far. If we don't adjust it enough, we get another fail high which is inefficient. If we adjust it too far, we search a larger tree than optimal (the reason for doing an aspiration search in the first place) which is again inefficient. So we want to do a "Goldilocks and the three bears" approach and get it just right.

Intuition tells me that the following, tuned a bit, is probably the right answer.

First, we can assume a positional fail-high needs to increase the margin by P, to catch (say) 90% of the positional fail-highs. If we initially increase the bound by this, 90% of the positions where we are only seeing a positional gain will be searched and produce an exact score. 10% of those will require a second re-search. We could increment by P again, or we could drop to the next step.

Second, if we get a fail high, and then a second fail high, we know that the true score is P+N better than the previous score. What can we guess about N. My thought is 1.0. We assume that if we are gaining more than P centipawns, that we are most likely winning a whole pawn. So the second re-search would be original + P + PAWN. And that will handle most of the cases. What next, assuming we fail high again. One could assume that maybe we should then add P again and re-search, and if that fails, add PAWN again and re-search. Something tells me that the better approach might be what we were doing in Cray Blitz:

assume previous iteration score was S, and that P = 0.25

Next iteration:

alpha=S-P, beta=S+P, search
if fail_high
beta=S-P, beta=S+P+P (winning a positional advantage)
if fail_high
beta=S+P+P+PAWN (maybe winning a pawn)
if fail_high
beta=S+P+P+3*PAWN (maybe winning a piece, not so common)
if fail_high
beta=S+P+P+9*PAWN (maybe winning a queen)
if fail_high
beta=infinity (must be a mate)

The downside is that there is nothing that says you just win an advantage, or a pawn, or a piece, or a queen, you can win a 2.5 pawn advantage by disrupting the opponent king, for example. In Crafty, I removed that queen option and just went positional, pawn, piece, infinity. Minimal re-searches. Minimal waste. Problem is, if you are winning a rook, you now have to search lines where you have the unforced chance to win a queen and you can't prune such branches with beta too high.

A lot of this is really moot on the fail-high side, because there you are likely simply discovering how quickly you will win the game.

But take this the other direction and it becomes more important, because when a move fails low, you want to quickly discover how bad.

BTW there are some more efficient ways to deal with fail-highs, assuming you don't mind dealing with incomplete information.

Belle (Ken Thompson) did this first, I used it for a long while in Cray Blitz...

When you fail high, no need to re-search since you know this is a new best move. Keep searching. If you end the iteration, good, start the next and search normally. Only sticking point is that if you get a second move that fails high, you are in a quandry, which is better. Re-search one with a wider window to get an exact score, then re-search the second with alpha=new best score, and beta = new-best-score + 1. If that move fails low, the first one is best. Keep going. If the second move fails high, remember but don't research unless another move fails high. Repeat as necessary.

The down side is that you don't know how good a move is since you don't resolve the fail-high immediately. That was a problem for me, because I use the score to control time allocation. Suppose you fail low, losing a pawn. And find another move that fails high. Does it save the pawn? Or does it lose the pawn but gain a tiny bit of positional advantage back? For the former, I can stop using extra time, while for the latter, I want to keep searching to find a way to save the pawn. But if I don't know the true score, I can't make that decision. So the PVS implementation used by Ken and myself way back has a hole in it that I don't like, and do not use today for that reason.

bob · Post by **bob** » Thu Mar 15, 2012 10:49 pm

Karlo Bala wrote:
Uri Blass wrote:
Houdini wrote:
Uri Blass wrote:The more interesting question is which houdini does not use that formula because there are more than one houdini version.
If you're really interested, no version of Houdini has used the formulae that Marco quoted above.
Houdini uses an aspiration window that widens much faster.

Robert
Thanks Robert.

My intuition also tells me that it is better to have less researches(increasing delta faster by a different formula may be a way to do it)

My intuition is somewhat different than yours. If your PV search is very similar to non-PV search, it should be a small percentage of false PV nodes, so that you can use a wider window. If your PV search is very differen from non-PV search (which is common today because of the LMR, NM, futility, delta, etc.) then it should be more false PV nodes. In that case you do not want to search a lot of false PV nodes with a wide window.

LMR and such does not necessarily mean PV and non-PV searches are different. I'm not convinced they should be different, myself..

bob · Post by **bob** » Thu Mar 15, 2012 10:55 pm

testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...

michiguel · Post by **michiguel** » Thu Mar 15, 2012 11:23 pm

bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...

I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.

Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).

Miguel

bob · Post by **bob** » Fri Mar 16, 2012 4:29 am

michiguel wrote:
bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...
I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.

Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).

Miguel

You might test the fail-low hypothesis.

Why? In Crafty, if the score drops 0.01 from the last iteration, I will use more time, just as if it had dropped 1.0... and a lot of games proved this to to be best. And not just by 1 or 2 elo either. I was surprised but have been running with this for a good while now (almost a year). I don't just search longer on a fail low, I search longer on any score drop at all, fail-low or not... It tests significantly better.

bob · Post by **bob** » Fri Mar 16, 2012 4:41 am

Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%

version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...

Rémi Coulom · Post by **Rémi Coulom** » Fri Mar 16, 2012 9:38 am

bob wrote:Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%

version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...

You should use CLOP

Rémi

Karlo Bala · Post by **Karlo Bala** » Fri Mar 16, 2012 12:30 pm

bob wrote:Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%

version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...

What is the average depth?

bob · Post by **bob** » Fri Mar 16, 2012 6:20 pm

Rémi Coulom wrote:
bob wrote:Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%

version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
You should use CLOP

Rémi

I intend on looking at it. But I can test this so simply at present, I just say "runtest" and it runs the test with each parameter change as needed. Of course it is not optimally tuning a parameter, just using the choices I give...

bob · Post by **bob** » Fri Mar 16, 2012 6:26 pm

Karlo Bala wrote:
bob wrote:Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%

version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
What is the average depth?

A quick looks says that the average is in the 13-14-15 range. I looked at a few logs and there are plenty of re-searches going on, so it is having a chance to exert influence.

optimal aspiration window for stockfish question

Re: optimal aspiration window for stockfish question

Re: optimal aspiration window for stockfish question

Re: optimal aspiration window for stockfish question (update

Re: optimal aspiration window for stockfish question (update

Re: optimal aspiration window for stockfish question (update

Re: test results

Re: test results

Re: test results

Re: test results

Re: test results