Page 2 of 4

Re: optimal aspiration window for stockfish question

Posted: Thu Mar 15, 2012 3:06 pm
by bob
mcostalba wrote:
bob wrote: Absolutely no change, either up or down.
Have you tested at longer TC ? It would be interesting to know how scales at longer TC, with your cluster should be feasible to test say at 1' TC.
I thought I had said that I tested up to 1 minute + 1 second. I did not go beyond that, and found absolutely no difference, which was surprising. The only thing I didnt like was the excessive re-searches to reach a mate. But at that point, it doesn't affect the game result at all of course...

When we first started doing this in Cray Blitz, we tried lots of ideas. Our aspiration window was roughly 1/3 of a pawn, so we relaxed alpha/beta to +1.0, then +3.0, then +9.0 and then all the way to infinite. In Crafty, I eliminated that +3.0 and have been using +1, +9 and +infinite for the longest...

I am also running a test (since I was not testing anything else) on various aspiration window widths as well. I'll post those results when they finish...

Re: optimal aspiration window for stockfish question

Posted: Thu Mar 15, 2012 6:30 pm
by bob
On this topic, here is another test I ran (not a cluster test.)

I took the original ippolit aspiration re-search window adjustment code as given previously. Only change I did make was that once the delta value hit 10 pawns, I set it to + infinity to avoid an extra re-search or two when looking for a mate.

For the second version, I started with Uri's suggestion of delta *= 2, which ramps it up quicker.

The first test showed that too many re-searches can be bad. Fine 70, searched to depth=45. Here's the two summary lines, with Uri's idea last:

log.001: time=9.62 mat=1 n=39906118 fh=92% nps=4.1M
log.002: time=1.19 mat=1 n=5520897 fh=98% nps=4.6M

Ignore the NPS since the second one ran so much faster, that is likely more rounding noise than actual data. But notice the node counts. The more conservative window widening takes 8x more nodes than the delta *= 2 approach. Since this is about fail high/fail low rather than normal positions, I tried a couple of others...

wac 141 (which is pretty easy of course):

log.001: time=0.40 mat=-1 n=1180277 fh=85% nps=3.0M
log.002: time=0.29 mat=-1 n=855712 fh=84% nps=3.0M

Here's one with no fail highs or lows, just to show identical behavior in that case:

log.001: time=14.30 mat=0 n=35929279 fh=93% nps=2.5M
log.002: time=14.34 mat=0 n=35929279 fh=93% nps=2.5M

Win at chess 2 which scatters a few fail highs here and there:

log.001: time=4.95 mat=0 n=15058005 fh=93% nps=3.0M
log.002: time=4.59 mat=0 n=13847564 fh=92% nps=3.0M

I will queue this up for a cluster test as well, although I don't expect much difference overall. 30K games might not be enough to show it is clearly better, but it does look better.

There are known positions where you can relax a bound too far and get into trouble. For example, if you fail high in a complex tactical position, but you are just going to gain a positional edge, if you relax the beta bound too far, suddenly you quit pruning mates that are not forced, which can explode the tree. But these appear to not be common in real games...

Re: optimal aspiration window for stockfish question

Posted: Thu Mar 15, 2012 9:11 pm
by bob
While I have tests running, here is some more info about my opinion of how this stuff should work.

Point 1. You want to (optimally) do exactly one re-search to resolve any fail-high or fail-low. More than one is inefficient and causes you to search a tree that you would normally not traverse. In the last iteration or two this can be expensive.

Point 2. In doing that one re-search, you don't want to increase the search window beyond what is necessary to capture the true score. Why? take a case where you are going to win a positional advantage of +.3 over what you saw at the previous iteration, and your aspiration is +/- old score + .2... You fail high. Where you set the beta bound can be important. For example, if you don't get it to +.3, you will fail high again, which is a waste. If you set it to +.31, that would be the perfect bound because you would not fail high, nor would you have to deal with any paths in the PV search that produce a score that is more than 0.01 better than what we know is the real score. The farther you advance beta, the more "crap" works its way into the search and the tree grows. Hence the potential advantage of a null-window search we often mention.

So, we have a conundrum. We want to adjust the bound just enough to "catch" the true score, without going too far. If we don't adjust it enough, we get another fail high which is inefficient. If we adjust it too far, we search a larger tree than optimal (the reason for doing an aspiration search in the first place) which is again inefficient. So we want to do a "Goldilocks and the three bears" approach and get it just right.

Intuition tells me that the following, tuned a bit, is probably the right answer.

First, we can assume a positional fail-high needs to increase the margin by P, to catch (say) 90% of the positional fail-highs. If we initially increase the bound by this, 90% of the positions where we are only seeing a positional gain will be searched and produce an exact score. 10% of those will require a second re-search. We could increment by P again, or we could drop to the next step.

Second, if we get a fail high, and then a second fail high, we know that the true score is P+N better than the previous score. What can we guess about N. My thought is 1.0. We assume that if we are gaining more than P centipawns, that we are most likely winning a whole pawn. So the second re-search would be original + P + PAWN. And that will handle most of the cases. What next, assuming we fail high again. One could assume that maybe we should then add P again and re-search, and if that fails, add PAWN again and re-search. Something tells me that the better approach might be what we were doing in Cray Blitz:

assume previous iteration score was S, and that P = 0.25

Next iteration:

alpha=S-P, beta=S+P, search
if fail_high
beta=S-P, beta=S+P+P (winning a positional advantage)
if fail_high
beta=S+P+P+PAWN (maybe winning a pawn)
if fail_high
beta=S+P+P+3*PAWN (maybe winning a piece, not so common)
if fail_high
beta=S+P+P+9*PAWN (maybe winning a queen)
if fail_high
beta=infinity (must be a mate)

The downside is that there is nothing that says you just win an advantage, or a pawn, or a piece, or a queen, you can win a 2.5 pawn advantage by disrupting the opponent king, for example. In Crafty, I removed that queen option and just went positional, pawn, piece, infinity. Minimal re-searches. Minimal waste. Problem is, if you are winning a rook, you now have to search lines where you have the unforced chance to win a queen and you can't prune such branches with beta too high.

A lot of this is really moot on the fail-high side, because there you are likely simply discovering how quickly you will win the game.

But take this the other direction and it becomes more important, because when a move fails low, you want to quickly discover how bad.

BTW there are some more efficient ways to deal with fail-highs, assuming you don't mind dealing with incomplete information.

Belle (Ken Thompson) did this first, I used it for a long while in Cray Blitz...

When you fail high, no need to re-search since you know this is a new best move. Keep searching. If you end the iteration, good, start the next and search normally. Only sticking point is that if you get a second move that fails high, you are in a quandry, which is better. Re-search one with a wider window to get an exact score, then re-search the second with alpha=new best score, and beta = new-best-score + 1. If that move fails low, the first one is best. Keep going. If the second move fails high, remember but don't research unless another move fails high. Repeat as necessary.

The down side is that you don't know how good a move is since you don't resolve the fail-high immediately. That was a problem for me, because I use the score to control time allocation. Suppose you fail low, losing a pawn. And find another move that fails high. Does it save the pawn? Or does it lose the pawn but gain a tiny bit of positional advantage back? For the former, I can stop using extra time, while for the latter, I want to keep searching to find a way to save the pawn. But if I don't know the true score, I can't make that decision. So the PVS implementation used by Ken and myself way back has a hole in it that I don't like, and do not use today for that reason.

Re: optimal aspiration window for stockfish question

Posted: Thu Mar 15, 2012 10:49 pm
by bob
Karlo Bala wrote:
Uri Blass wrote:
Houdini wrote:
Uri Blass wrote:The more interesting question is which houdini does not use that formula because there are more than one houdini version.
If you're really interested, no version of Houdini has used the formulae that Marco quoted above.
Houdini uses an aspiration window that widens much faster.

Robert
Thanks Robert.

My intuition also tells me that it is better to have less researches(increasing delta faster by a different formula may be a way to do it)
My intuition is somewhat different than yours. If your PV search is very similar to non-PV search, it should be a small percentage of false PV nodes, so that you can use a wider window. If your PV search is very differen from non-PV search (which is common today because of the LMR, NM, futility, delta, etc.) then it should be more false PV nodes. In that case you do not want to search a lot of false PV nodes with a wide window.
LMR and such does not necessarily mean PV and non-PV searches are different. I'm not convinced they should be different, myself..

Re: optimal aspiration window for stockfish question (update

Posted: Thu Mar 15, 2012 10:55 pm
by bob
testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...

Re: optimal aspiration window for stockfish question (update

Posted: Thu Mar 15, 2012 11:23 pm
by michiguel
bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...
I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.

Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).

Miguel

Re: optimal aspiration window for stockfish question (update

Posted: Fri Mar 16, 2012 4:29 am
by bob
michiguel wrote:
bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...
I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.

Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).

Miguel
You might test the fail-low hypothesis. :) Why? In Crafty, if the score drops 0.01 from the last iteration, I will use more time, just as if it had dropped 1.0... and a lot of games proved this to to be best. And not just by 1 or 2 elo either. I was surprised but have been running with this for a good while now (almost a year). I don't just search longer on a fail low, I search longer on any score drop at all, fail-low or not... It tests significantly better.

Re: test results

Posted: Fri Mar 16, 2012 4:41 am
by bob
Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%


version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...

Re: test results

Posted: Fri Mar 16, 2012 9:38 am
by Rémi Coulom
bob wrote:Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%


version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
You should use CLOP :-)

Rémi

Re: test results

Posted: Fri Mar 16, 2012 12:30 pm
by Karlo Bala
bob wrote:Here is the output from BayesElo first:

2 Crafty-23.5R06-200 2650 4 4 30000 64% 2539 22%
3 Crafty-23.5R06-24 2650 4 4 30000 64% 2539 22%
4 Crafty-23.5R06-100 2649 4 4 30000 63% 2539 22%
5 Crafty-23.5R06-30 2648 4 4 30000 63% 2539 22%
6 Crafty-23.5R06-50 2648 4 4 30000 63% 2539 22%
7 Crafty-23.5R06-300 2648 4 4 30000 63% 2539 22%
8 Crafty-23.5R06-20 2648 4 4 30000 63% 2539 22%
9 Crafty-23.5R06-10 2645 4 4 30000 63% 2539 22%
10 Crafty-23.5-2 2645 4 4 30000 63% 2539 22%
11 Crafty-23.5R06-1 2645 4 4 30000 63% 2539 22%
12 Crafty-23.5R06-8 2644 4 4 30000 63% 2539 22%
13 Crafty-23.5R06-5 2643 4 4 30000 63% 2539 22%
14 Crafty-23.5-1 2641 4 4 30000 63% 2539 22%
15 Crafty-23.5R06-2 2636 4 4 30000 62% 2539 22%


version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
What is the average depth?