optimal aspiration window for stockfish question

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
bob
Posts: 20344
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: optimal aspiration window for stockfish question

Post by bob » Wed Mar 14, 2012 10:18 pm

Don wrote:
mcostalba wrote:
Rebel wrote:I am sure the SF team have done considerable testing here.
Yes, we have.

At the end of endless trials with differnt formulas and values we end up in starting with a small window value:

Code: Select all

delta = Value(16);
alpha = RootMoves[PVIdx].prevScore - delta;
beta  = RootMoves[PVIdx].prevScore + delta;
That is increased after a fail low/high with the following formula:

Code: Select all

if (bestValue >= beta)
{
    beta += delta;
    delta += delta / 2;
}
else if &#40;bestValue <= alpha&#41;
&#123;   
    alpha -= delta;
    delta += delta / 2;
&#125;

I want to add that this is the Ippo* formula and I think that very probably it is what is used in _all_ the top engines from Rybka 3 to Houdini. Although we knew the Ippo formula since when sources were published we moved to that only one year later, after having tried all the possible different combinations: some are weaker, some are equivalent ELO wise, but more complex, so this is the simplest formula (we know) that guarantees top performance.

Answering to Uri: I am not interested in tweaking the engine on a sample position. I only use real games to validate a change.
That's what we do. We will run a few hundred games at various fixed depth levels and have the instrumentation that gives us the average time per move up to some arbitrary move number. I think we currently use move 70. We think this is superior to just running 100 positions and timing them. We only do this when we want data on something that is supposed to be a speedup - normally we just run time games to measure general improvements and things that involve trade-offs.
Since it was a trivial change, I took the above and rewrote it to fit Crafty's search. Absolutely no change, either up or down. A couple of things I do not like about it however. 1. Several re-searches to win a pawn or more, and 2. way too many re-searches to find a forced mate.

We've used several different approaches. In Cray Blitz we did more "bumps". In Crafty I settled on fewer. But it actually doesn't seem to matter one bit based on the testing I just did, which was run at fast games and at 1min+1sec games as well. Absolutely no +/- change when switching from what I did to this, and back...

User avatar
lucasart
Posts: 2957
Joined: Mon May 31, 2010 11:29 am
Contact:

Re: optimal aspiration window for stockfish question

Post by lucasart » Thu Mar 15, 2012 2:03 am

Eelco de Groot wrote:
lucasart wrote:
mcostalba wrote:

Code: Select all

delta = Value&#40;16&#41;;
your initial delta is only 1/16-th of a pawn ?
Well no, because of

Code: Select all

const Value PawnValueMidgame   = Value&#40;0x0C6&#41;;
const Value PawnValueEndgame   = Value&#40;0x102&#41;;
in types.h you can see it is actually a lot less than that :) The smaller you make it the more a PV search begins to resemble the nullwindow search, apart from the reductions missing from PV nodes of course like null move pruning etc. Compare that with Fruit 2.1 that has an infinite window, and I don't think it had aspiration windows? Not sure without looking it up.

Eelco
I meant end game pawn of course. this is the base score for engines typically. i use endgame pawn = 100, but SF uses 0x102 = 258, so 16 ~= 0x102 / 16.

mcostalba
Posts: 2679
Joined: Sat Jun 14, 2008 7:17 pm

Re: optimal aspiration window for stockfish question

Post by mcostalba » Thu Mar 15, 2012 6:13 am

lucasart wrote:
Eelco de Groot wrote:
lucasart wrote:
mcostalba wrote:

Code: Select all

delta = Value&#40;16&#41;;
your initial delta is only 1/16-th of a pawn ?
Well no, because of

Code: Select all

const Value PawnValueMidgame   = Value&#40;0x0C6&#41;;
const Value PawnValueEndgame   = Value&#40;0x102&#41;;
in types.h you can see it is actually a lot less than that :) The smaller you make it the more a PV search begins to resemble the nullwindow search, apart from the reductions missing from PV nodes of course like null move pruning etc. Compare that with Fruit 2.1 that has an infinite window, and I don't think it had aspiration windows? Not sure without looking it up.

Eelco
I meant end game pawn of course. this is the base score for engines typically. i use endgame pawn = 100, but SF uses 0x102 = 258, so 16 ~= 0x102 / 16.
Window size is 2*delta

mcostalba
Posts: 2679
Joined: Sat Jun 14, 2008 7:17 pm

Re: optimal aspiration window for stockfish question

Post by mcostalba » Thu Mar 15, 2012 6:15 am

bob wrote: Absolutely no change, either up or down.
Have you tested at longer TC ? It would be interesting to know how scales at longer TC, with your cluster should be feasible to test say at 1' TC.

bob
Posts: 20344
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: optimal aspiration window for stockfish question

Post by bob » Thu Mar 15, 2012 2:06 pm

mcostalba wrote:
bob wrote: Absolutely no change, either up or down.
Have you tested at longer TC ? It would be interesting to know how scales at longer TC, with your cluster should be feasible to test say at 1' TC.
I thought I had said that I tested up to 1 minute + 1 second. I did not go beyond that, and found absolutely no difference, which was surprising. The only thing I didnt like was the excessive re-searches to reach a mate. But at that point, it doesn't affect the game result at all of course...

When we first started doing this in Cray Blitz, we tried lots of ideas. Our aspiration window was roughly 1/3 of a pawn, so we relaxed alpha/beta to +1.0, then +3.0, then +9.0 and then all the way to infinite. In Crafty, I eliminated that +3.0 and have been using +1, +9 and +infinite for the longest...

I am also running a test (since I was not testing anything else) on various aspiration window widths as well. I'll post those results when they finish...

bob
Posts: 20344
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: optimal aspiration window for stockfish question

Post by bob » Thu Mar 15, 2012 5:30 pm

On this topic, here is another test I ran (not a cluster test.)

I took the original ippolit aspiration re-search window adjustment code as given previously. Only change I did make was that once the delta value hit 10 pawns, I set it to + infinity to avoid an extra re-search or two when looking for a mate.

For the second version, I started with Uri's suggestion of delta *= 2, which ramps it up quicker.

The first test showed that too many re-searches can be bad. Fine 70, searched to depth=45. Here's the two summary lines, with Uri's idea last:

log.001: time=9.62 mat=1 n=39906118 fh=92% nps=4.1M
log.002: time=1.19 mat=1 n=5520897 fh=98% nps=4.6M

Ignore the NPS since the second one ran so much faster, that is likely more rounding noise than actual data. But notice the node counts. The more conservative window widening takes 8x more nodes than the delta *= 2 approach. Since this is about fail high/fail low rather than normal positions, I tried a couple of others...

wac 141 (which is pretty easy of course):

log.001: time=0.40 mat=-1 n=1180277 fh=85% nps=3.0M
log.002: time=0.29 mat=-1 n=855712 fh=84% nps=3.0M

Here's one with no fail highs or lows, just to show identical behavior in that case:

log.001: time=14.30 mat=0 n=35929279 fh=93% nps=2.5M
log.002: time=14.34 mat=0 n=35929279 fh=93% nps=2.5M

Win at chess 2 which scatters a few fail highs here and there:

log.001: time=4.95 mat=0 n=15058005 fh=93% nps=3.0M
log.002: time=4.59 mat=0 n=13847564 fh=92% nps=3.0M

I will queue this up for a cluster test as well, although I don't expect much difference overall. 30K games might not be enough to show it is clearly better, but it does look better.

There are known positions where you can relax a bound too far and get into trouble. For example, if you fail high in a complex tactical position, but you are just going to gain a positional edge, if you relax the beta bound too far, suddenly you quit pruning mates that are not forced, which can explode the tree. But these appear to not be common in real games...

bob
Posts: 20344
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: optimal aspiration window for stockfish question

Post by bob » Thu Mar 15, 2012 8:11 pm

While I have tests running, here is some more info about my opinion of how this stuff should work.

Point 1. You want to (optimally) do exactly one re-search to resolve any fail-high or fail-low. More than one is inefficient and causes you to search a tree that you would normally not traverse. In the last iteration or two this can be expensive.

Point 2. In doing that one re-search, you don't want to increase the search window beyond what is necessary to capture the true score. Why? take a case where you are going to win a positional advantage of +.3 over what you saw at the previous iteration, and your aspiration is +/- old score + .2... You fail high. Where you set the beta bound can be important. For example, if you don't get it to +.3, you will fail high again, which is a waste. If you set it to +.31, that would be the perfect bound because you would not fail high, nor would you have to deal with any paths in the PV search that produce a score that is more than 0.01 better than what we know is the real score. The farther you advance beta, the more "crap" works its way into the search and the tree grows. Hence the potential advantage of a null-window search we often mention.

So, we have a conundrum. We want to adjust the bound just enough to "catch" the true score, without going too far. If we don't adjust it enough, we get another fail high which is inefficient. If we adjust it too far, we search a larger tree than optimal (the reason for doing an aspiration search in the first place) which is again inefficient. So we want to do a "Goldilocks and the three bears" approach and get it just right.

Intuition tells me that the following, tuned a bit, is probably the right answer.

First, we can assume a positional fail-high needs to increase the margin by P, to catch (say) 90% of the positional fail-highs. If we initially increase the bound by this, 90% of the positions where we are only seeing a positional gain will be searched and produce an exact score. 10% of those will require a second re-search. We could increment by P again, or we could drop to the next step.

Second, if we get a fail high, and then a second fail high, we know that the true score is P+N better than the previous score. What can we guess about N. My thought is 1.0. We assume that if we are gaining more than P centipawns, that we are most likely winning a whole pawn. So the second re-search would be original + P + PAWN. And that will handle most of the cases. What next, assuming we fail high again. One could assume that maybe we should then add P again and re-search, and if that fails, add PAWN again and re-search. Something tells me that the better approach might be what we were doing in Cray Blitz:

assume previous iteration score was S, and that P = 0.25

Next iteration:

alpha=S-P, beta=S+P, search
if fail_high
beta=S-P, beta=S+P+P (winning a positional advantage)
if fail_high
beta=S+P+P+PAWN (maybe winning a pawn)
if fail_high
beta=S+P+P+3*PAWN (maybe winning a piece, not so common)
if fail_high
beta=S+P+P+9*PAWN (maybe winning a queen)
if fail_high
beta=infinity (must be a mate)

The downside is that there is nothing that says you just win an advantage, or a pawn, or a piece, or a queen, you can win a 2.5 pawn advantage by disrupting the opponent king, for example. In Crafty, I removed that queen option and just went positional, pawn, piece, infinity. Minimal re-searches. Minimal waste. Problem is, if you are winning a rook, you now have to search lines where you have the unforced chance to win a queen and you can't prune such branches with beta too high.

A lot of this is really moot on the fail-high side, because there you are likely simply discovering how quickly you will win the game.

But take this the other direction and it becomes more important, because when a move fails low, you want to quickly discover how bad.

BTW there are some more efficient ways to deal with fail-highs, assuming you don't mind dealing with incomplete information.

Belle (Ken Thompson) did this first, I used it for a long while in Cray Blitz...

When you fail high, no need to re-search since you know this is a new best move. Keep searching. If you end the iteration, good, start the next and search normally. Only sticking point is that if you get a second move that fails high, you are in a quandry, which is better. Re-search one with a wider window to get an exact score, then re-search the second with alpha=new best score, and beta = new-best-score + 1. If that move fails low, the first one is best. Keep going. If the second move fails high, remember but don't research unless another move fails high. Repeat as necessary.

The down side is that you don't know how good a move is since you don't resolve the fail-high immediately. That was a problem for me, because I use the score to control time allocation. Suppose you fail low, losing a pawn. And find another move that fails high. Does it save the pawn? Or does it lose the pawn but gain a tiny bit of positional advantage back? For the former, I can stop using extra time, while for the latter, I want to keep searching to find a way to save the pawn. But if I don't know the true score, I can't make that decision. So the PVS implementation used by Ken and myself way back has a hole in it that I don't like, and do not use today for that reason.

bob
Posts: 20344
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: optimal aspiration window for stockfish question

Post by bob » Thu Mar 15, 2012 9:49 pm

Karlo Bala wrote:
Uri Blass wrote:
Houdini wrote:
Uri Blass wrote:The more interesting question is which houdini does not use that formula because there are more than one houdini version.
If you're really interested, no version of Houdini has used the formulae that Marco quoted above.
Houdini uses an aspiration window that widens much faster.

Robert
Thanks Robert.

My intuition also tells me that it is better to have less researches(increasing delta faster by a different formula may be a way to do it)
My intuition is somewhat different than yours. If your PV search is very similar to non-PV search, it should be a small percentage of false PV nodes, so that you can use a wider window. If your PV search is very differen from non-PV search (which is common today because of the LMR, NM, futility, delta, etc.) then it should be more false PV nodes. In that case you do not want to search a lot of false PV nodes with a wide window.
LMR and such does not necessarily mean PV and non-PV searches are different. I'm not convinced they should be different, myself..

bob
Posts: 20344
Joined: Mon Feb 27, 2006 6:30 pm
Location: Birmingham, AL

Re: optimal aspiration window for stockfish question (update

Post by bob » Thu Mar 15, 2012 9:55 pm

testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...

User avatar
michiguel
Posts: 6266
Joined: Thu Mar 09, 2006 7:30 pm
Location: Chicago, Illinois, USA
Contact:

Re: optimal aspiration window for stockfish question (update

Post by michiguel » Thu Mar 15, 2012 10:23 pm

bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...

BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...

hmmm...
I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.

Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).

Miguel

Post Reply