bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...
BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...
hmmm...
I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.
Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).
bob wrote:testing still underway, but one surprise already. Turns out that narrow aspiration windows might be worthless. I ran a test with several starting windows to see if there was a clear winner. There were clear losers like +/-1 and +/- 2 and such. But +/-16, +/-50, +/-100, +/-200 all seem to be equivalent in real-game testing. More when the test is completely done...
BTW this is the "delta" value. Which means with a value of 200 the first search is old+/- 200, the first research is old+/-400...
hmmm...
I believe this will depend quite a bit on the specific engine. I observed a clearly distinct peak at about 40 cp (IIRC) with a slow decline toward higher deltas and a sharper decline towards lower ones. However, this peak was not very high. IIRC, when I tested this, it may have been ~3 elo points or so (compared to no aspiration). But the curve was clear. Of course I needed to run 80k games each time to have enough precision.
Anyway, there are two effects here. One of them has not been mentioned. The obvious one is the size of the tree and researches, but the ignored one is that fail lows may trigger longer thinking times to catch problems. Too narrow windows will cause the engine to think longer in positions that are not needed, and too wide windows, will become insensitive to potential problems (they won't fail low so often).
Miguel
You might test the fail-low hypothesis. Why? In Crafty, if the score drops 0.01 from the last iteration, I will use more time, just as if it had dropped 1.0... and a lot of games proved this to to be best. And not just by 1 or 2 elo either. I was surprised but have been running with this for a good while now (almost a year). I don't just search longer on a fail low, I search longer on any score drop at all, fail-low or not... It tests significantly better.
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
You should use CLOP
Rémi
I intend on looking at it. But I can test this so simply at present, I just say "runtest" and it runs the test with each parameter change as needed. Of course it is not optimally tuning a parameter, just using the choices I give...
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
What is the average depth?
A quick looks says that the average is in the 13-14-15 range. I looked at a few logs and there are plenty of re-searches going on, so it is having a chance to exert influence.
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
What is the average depth?
A quick looks says that the average is in the 13-14-15 range. I looked at a few logs and there are plenty of re-searches going on, so it is having a chance to exert influence.
What should not be forgotten is the saturation of the hash table. With an almost full hash table researches will become very expensive and it would make sense to widen the window.
version 23.5-1 and 23.5-2 are simply two consecutive runs with the same version to provide a normal result. The rest of the tests are version 23.5R06 and were tested where the -n is the aspiration window (delta value in the code posted yesterday). 23.5R06-1 means the aspiration window was +/- 1 with delta=1. 1 and 2 are a bit low, and by the time it gets to 10, it is pretty optimal. Bigger doesn't seem to hurt at all up to +/- 3.0 pawns... I was expecting a better result in the 20-40 range, the reason I ran the big numbers was to produce some worse results so that there is a recognizable curve with a clear optimal value, and worse results on either side. Didn't get exactly what I expected, as you can see...
What is the average depth?
A quick looks says that the average is in the 13-14-15 range. I looked at a few logs and there are plenty of re-searches going on, so it is having a chance to exert influence.
What should not be forgotten is the saturation of the hash table. With an almost full hash table researches will become very expensive and it would make sense to widen the window.
That's one problem I really don't have to deal with. I don't hash the q-search, so I really don't see much in terms of saturation. When I was testing this stuff, I found that hashing the q-search reduced the total tree size by about 10%, but it slowed the program down by almost exactly the same amount. A wash. But by not hashing the q-search, the stress on the ttable is greatly reduced, which tends to make this pay off (not hashing qsearch) in real long games without sufficient memory for the ttable. 8 gigs gives 512 million entries, which is a lot even for a long 40/2hr type game...
I run fast games with a modest hash on the cluster, just to try to keep things within perspective...
bob wrote: Absolutely no change, either up or down.
Have you tested at longer TC ? It would be interesting to know how scales at longer TC, with your cluster should be feasible to test say at 1' TC.
I thought I had said that I tested up to 1 minute + 1 second. I did not go beyond that, and found absolutely no difference, which was surprising. The only thing I didnt like was the excessive re-searches to reach a mate. But at that point, it doesn't affect the game result at all of course...
When we first started doing this in Cray Blitz, we tried lots of ideas. Our aspiration window was roughly 1/3 of a pawn, so we relaxed alpha/beta to +1.0, then +3.0, then +9.0 and then all the way to infinite. In Crafty, I eliminated that +3.0 and have been using +1, +9 and +infinite for the longest...
I am also running a test (since I was not testing anything else) on various aspiration window widths as well. I'll post those results when they finish...
Cray Blitz still alive?
In diep i'm starting search with (-inf,+inf) for a simple reason: hashtable will directly give back a great bound anyway and you nullwindow around that.
Only when your PV is a total mess i assume that aspiration search is a great idea, shouldn't happen for a proper YBW search.
Maybe that's why crafty doesn't suffer from the same problem there.