Uri Blass wrote:If I understand correctly you accept only one move at every ply after ply 3 and multiply by the number of legal move.
You may be right that it gives an unbiased estimate but we need some proof for it and it does not seem obvious like the case of fixed probability.
Here is a proof based on induction over the tree depth.
Define the random variable perfTS(pos,N) = the return value of perfTSampled(N, 0) when
called with initial position pos. Also define pos(m) as the position you get when playing
move m in position pos.
Let perfTS(pos,N) have probability distribution (x(pos,N,i),p(pos,N,i)), meaning that it
can take values x(pos,N,i) with corresponding probabilities p(pos,N,i), where
1<=i<=R(pos,N) and R(pos,N) is the number of possible values of perfTS(pos,N).
According to the definition of expected value, we have:
E(perfTS(pos,N)) = sum(i, x(pos,N,i)*p(pos,N,i))
The claim is that perfTS(pos,N) is an unbiased estimate of perfT(pos,N) for all possible
positions pos and all N > 0, i.e. that E(perfTS(pos,N)) = perfT(pos,N).
First note that for N=1, we have:
E(perfTS(pos,1)) = 1 * s = perfT(pos,1),
where s is the number of legal moves in position pos, because perfTS(pos,1) returns s with
probability 1 in that case. So the claim is true for N=1.
Next, assume that the claim is true for a given N and all positions pos, that is:
E(perfTS(pos,N)) = perfT(pos,N)
Then, for N+1, and a given position pos, let the legal moves be m1,m2,...,ms, where s is
the number of legal moves.
The possible return values of perfTS(pos,N+1) are
s * x(pos(mj),N,i), for 1<=j<=s, and 1<=i<=R(pos(mj),N)
and the corresponding probabilities are
P(selecting move j) * p(pos(mj),N,i))
because the two events are independent if "selecting move j" is performed using a "good
enough" random number generator. Also P(selecting move j)=1/s if the random number
generator is "good enough".
Therefore, according to the definition of expected value, we have:
Code: Select all
E(perfTS(pos,N+1)) = sum({i,j}, s * x(pos(mj),N,i) * 1/s * p(pos(mj),N,i))
= sum({i,j}, x(pos(mj),N,i) * p(pos(mj),N,i)) (s*1/s=1)
= sum(j, E(perfTS(pos(mj),N))) (sum over i first, use definition of expected value)
= sum(j, perfT(pos(mj),N)) (induction hypothesis)
= perfT(pos,N+1)) (definition of perfT)
Since pos was arbitrary, it follows that the claim is true also for N+1, and the proof now
follows from the principle of induction.