search pv moves first

ashort · Post by **ashort** » Thu Sep 17, 2020 6:11 pm

Like many, I came across Bruce Moreland's excellent write-ups and Gerbil code, which I found instructive for chess programming. In scanning his Gerbil code, in THINK.C, he has a long routine that handles the iterative deepening, using an aspiration window centered around the previous iterations's score (the first iteration being an infinite window). If one of the iterations fails low or high, he re-searches with an infinite window (infinite because of search instability). So far so good. I understand this.

During a search, he collects the PV, and after the search, the final PV is sitting in the root element of his state collection. Before embarking on the next iteration's search, he calls a routine called VSetPv (code is in ENGINE.c), which copies the collected PV, move by move, into another area of his stack. He uses this other area to ensure that the next iteration's search uses the PV from the previous iteration. Again, so far so good. I understand this.

Where I am getting confused is what happens when one of his iterations fails high or low. Before re-searching with an infinite window, he explicitly calls VSetPv again. Why? Why would there be a PV in that case? The search failed high or low, which means there wouldn't be a PV, right? Seems to me it will just be the same PV from before, since the search that failed low or high never backed up the PV all the way to the root.

maksimKorzh · Post by **maksimKorzh** » Thu Sep 17, 2020 9:46 pm

Doesn't re-search in full alpha beta infinite window mean that instead of proving the moves to be bad within aspiration window we have eventually found an 'interesting' move. By saying this I mean the move actually satisfying the condition

Code: Select all

score > alpha

which is actually a PV move so this seems to be the reason to retrieve new PV line.

ashort · Post by **ashort** » Thu Sep 17, 2020 10:21 pm

the context of my question is entirely within one iterative deepening loop, searching at ever increasing depths.

the iteration's search (let's call it iteration X, in iterative deepening) is failing high or low. Then, BEFORE re-searching with an infinite window (to try again at iteration X), he tries to use the PV of the just failed search (from iteration X) to make the re-search go faster. This does not make sense to me, because if the search of iteration X failed high or low, then it means the score was not between alpha and beta. Rather, the score was <=alpha or >=beta. So why would there be a new PV in that case? I thought the PV line is only updated if it's not a fail low and it's not a fail high....

Here is the code from Gerbil: (it's his 2nd call to VSetPv which is confusing me. seems unnecessary )

Code: Select all

		...
		valCur = 0;
		valLow = valMIN;			// Initial window is wide open.
		valHigh = valMAX;
		for (i = 0; i < plyMAX; i++)  {
			pste->valAlpha = valLow;
			pste->valBeta = valHigh;
			pste->plyRem = i;
			pcon->ss.plyDepth = i + 1;
			VSetPv(pcon);	// This remembers the current PV so that the
							//  search can search its elements first.
			valCur = ValSearch(pcon, pste);
			...
			//
			//	This checks for a result outside the window, and if it finds
			//	one it re-searches with the window wide open.
			//
			if ((valCur <= valLow) || (valCur >= valHigh)) {
				...
				pste->valAlpha = valMIN;
				pste->valBeta = valMAX;
				VSetPv(pcon);
				valCur = ValSearch(pcon, pste);
				...
			}
			...
			//	Set up for next iteration, which will use a window centered
			//	around the current position value.
			//
			valLow = valCur - 50;
			valHigh = valCur + 50;
		}

maksimKorzh · Post by **maksimKorzh** » Thu Sep 17, 2020 10:42 pm

ashort wrote: ↑Thu Sep 17, 2020 10:21 pm the context of my question is entirely within one iterative deepening loop, searching at ever increasing depths.

the iteration's search (let's call it iteration X, in iterative deepening) is failing high or low. Then, BEFORE re-searching with an infinite window (to try again at iteration X), he tries to use the PV of the just failed search (from iteration X) to make the re-search go faster. This does not make sense to me, because if the search of iteration X failed high or low, then it means the score was not between alpha and beta. Rather, the score was <=alpha or >=beta. So why would there be a new PV in that case? I thought the PV line is only updated if it's not a fail low and it's not a fail high....

IMO the search of iteration X failed neither high

Code: Select all

score >= beta

nor low

Code: Select all

score <= alpha

but instead

Code: Select all

score > alpha && score < beta

failed into an alpha beta window (not sure how to call it), so e.g. assuming the aspiration window we have alpha equals to 25 and beta equals to 27 and when re-search occurs we already have score equals exactly 26. So as far as this particular move by fact has increased alpha by one then technically it might be considered as PV move hence the need to order this move with score = 26 first in full window search to replace the previous one which was scored at least less then 26.

Now I want to add this trick into my engine to see the difference.

ashort · Post by **ashort** » Thu Sep 17, 2020 11:13 pm

I am still confused. He searches the very first iteration with an infinite window:

Code: Select all

pste->valAlpha = valLow;
pste->valBeta = valHigh;
pste->plyRem = i;
pcon->ss.plyDepth = i + 1;
VSetPv(pcon);	// This remembers the current PV so that the
			//  search can search its elements first.
valCur = ValSearch(pcon, pste);

After the search, he checks to see if that search failed high or failed low (from a search that did not have an infinity window, obviously),
and if so, he re-searches with an infinite window:

Code: Select all

if ((valCur <= valLow) || (valCur >= valHigh)) {
	...
	pste->valAlpha = valMIN;
	pste->valBeta = valMAX;
	VSetPv(pcon);
	valCur = ValSearch(pcon, pste);
	...
}

right?

maksimKorzh · Post by **maksimKorzh** » Thu Sep 17, 2020 11:20 pm

Also valLow and valHigh are not the actual alpha beta values but valCurrent - 50 and valCurrent + 50 which IMO means that even with score <= valLow it still can be greater than actual alpha before narrowing the window hence PV move to order first in full window search.

I'm tempted to read expert's opinion on this point.

ashort · Post by **ashort** » Thu Sep 17, 2020 11:29 pm

when one of the searches faills high or low, his test for failure is:

Code: Select all

if ((valCur <= valLow) || (valCur >= valHigh)) {

that makes sense, because valCur is the result of the search. Note that valLow and valHigh are updated before the search by +/- 50.

Iteration 1 uses alpha beta window of [-INFINIFY, INFINITY]. This obviously cannot fail low or high.
Suppose it returns a score of 10.
Iteration 2 will use an alpha beta window of [-40, 60] (this is what valLow and valHigh are set to)
Iteration 2 might fail low or fail high (with respect to -40 and 60). If it does, he re-searches iteration 2 with [[-INFINIFY, INFINITY].

maksimKorzh · Post by **maksimKorzh** » Thu Sep 17, 2020 11:45 pm

ashort wrote: ↑Thu Sep 17, 2020 11:29 pm when one of the searches faills high or low, his test for failure is:
Code: Select all
if ((valCur <= valLow) || (valCur >= valHigh)) {
that makes sense, because valCur is the result of the search. Note that valLow and valHigh are updated before the search by +/- 50.

Iteration 1 uses alpha beta window of [-INFINIFY, INFINITY]. This obviously cannot fail low or high.
Suppose it returns a score of 10.
Iteration 2 will use an alpha beta window of [-40, 60] (this is what valLow and valHigh are set to)
Iteration 2 might fail low or fail high (with respect to -40 and 60). If it does, he re-searches iteration 2 with [[-INFINIFY, INFINITY].

Yes. Exactly. But what is the PV move for iteration 2 that would get researched with full window?

May be it restores PV from iteration 1?
May be it was lost during narrow window search?
I now got your confusion...

But assuming Bruce Moreland is infinitely smarter than I his code should make sense. Otherwise someone would reveal this bug long time ago.

Yeah. Now I'm tempted to know the reason behind resetting PV. Hope someone would explain.
What I initially meant was related to PVS explai explained by Bruce as well. My assumption was that principle working with PVS works with aspiration window in the same manner.
It would be great to clarify it.

chrisw · Post by **chrisw** » Thu Sep 17, 2020 11:46 pm

ashort wrote: ↑Thu Sep 17, 2020 6:11 pm Like many, I came across Bruce Moreland's excellent write-ups and Gerbil code, which I found instructive for chess programming. In scanning his Gerbil code, in THINK.C, he has a long routine that handles the iterative deepening, using an aspiration window centered around the previous iterations's score (the first iteration being an infinite window). If one of the iterations fails low or high, he re-searches with an infinite window (infinite because of search instability). So far so good. I understand this.

During a search, he collects the PV, and after the search, the final PV is sitting in the root element of his state collection. Before embarking on the next iteration's search, he calls a routine called VSetPv (code is in ENGINE.c), which copies the collected PV, move by move, into another area of his stack. He uses this other area to ensure that the next iteration's search uses the PV from the previous iteration. Again, so far so good. I understand this.

Where I am getting confused is what happens when one of his iterations fails high or low. Before re-searching with an infinite window, he explicitly calls VSetPv again. Why? Why would there be a PV in that case? The search failed high or low, which means there wouldn't be a PV, right? Seems to me it will just be the same PV from before, since the search that failed low or high never backed up the PV all the way to the root.

From general principles, I’m not commenting his specific implementation.

Principle of AB is to search best move first. The PV, by definition contains the best (known) line and thus the best (known) moves, therefore, he is opening the tree search by following up the known PV first.

If there is subsequently a fail high or low, as you observe, he then follows whatever is his best line again. Either he found some new move(s) to substitute the prior PV of if not, the prior PV. If he hasn’t found a new best line, then it’s fine to use the prior, what else?

In real, following the PV is redundant if you have a hash table, for the best continuation will be in the hash and it’s normal to search that first. For people who want to get risky there’s no need to store a PV in the first place. You can discover it by asking the hash table and moving up the tree that way. A bit dangerous because if unlucky an important hash entry might be overwritten and there may be issues if repetition moves are stored.

ashort · Post by **ashort** » Thu Sep 17, 2020 11:58 pm

thank you. well, it seems to me that based on his code, the 2nd call to VSetPv is useless (and harmless, except for an extremely minor slow-down), as there is no way a search that failed high or failed low could have come up with a new PV. At least that is my understanding.

[I am writing for a 16-bit imaginary machine in a language that does not support XOR, so I am not using hash tables at all, nor am I using bitboards. I'm using 0x88. The language is from a course I took on how to write a compiler, and it's a simple but strong object oriented language, but does not support XOR because the point was to keep the language simple. It has AND, OR, and NOT, but not XOR. Also the heap size of the imaginary machine is too small for a reasonable hash table anyway).

search pv moves first

search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first

Re: search pv moves first