Uri Blass wrote:The reason is that I thought wrongly that it is a biased estimate.
see my last post in the first page of the thread of perft(20).
Note that I thought about this idea and even generated some example to check if I get an unbiased estimate before thinking about proving that I get an unbiased estimate
but did the mistake of assuming that every game has an equal probability when by choosing random moves not every game has an equal probability(unlike choosing random moves that may be illegal in part of the cases if the number in the sequence is bigger than the number of moves).
I understood my error only later when I read that people did exactly what you suggest to get a better estimate and only posted that we need a proof
for it and understood the proof.
I looked at your example and it seems that it will be the exact method that is used by Peter here, because you do not use the proportion of legal moves anymore. Note that in your example you said perft(3) = 11 but that is only leaf node counts, if we add the internal nodes too it will be 18 i.e (11 + 5 + 2). The multiplication by the number of legal moves at each ply estimates this number. And from your calculation = 176/11 = 16 , so this estimate is close. They will equal each other with more number of games I think.More number of games than the number of leaf nodes is required whenever the tree is selective.
I now understand that your method does not actually estimate BF for each ply. We have an upper bound of perft(13) that we try to factor by monte-carlo simulation to give us a perft estimate. This is I think much better than what I am proposing, averaging BFs at each ply. I tested this idea and here is what I got.
First using the idea in your example (similar to Peter's I think):
After 1000 games, I get an estimate of 1.94 * e18 and and after 1 million games it got pretty close to 1.981e18 so that is a confirmation I think..
This method does not fall under those method which estimate BFs because those give very bad estimates as I explained below and I am not sure if they are unbiased at all.
Then I tried averaging branching factors at each ply 1000 times and then
multiplying those 13 branching factors once. This gave a bad estimate even after i increase the simulation to 10^7 which gave me around 1.3e18 and it remained there even after many more games. I think this method is biased at least practically.
BFs for 1st game (x1,x2,x3.....x13)
BFs for 2nd game (y1,y2,y3.....y13)
.
BFs for kth game (z1,z2,z3.....z13)
-----------------------------------------------------------------
If I do arithmetic mean of the products of BFs (nodes count approximation) then
perft(13) = 1/k(x1*x2...x13 + y1*y2*...y13 + z1*z2*...z13))
This gives very good estimates as I reported above.
---------------------------------------------------------------------
BF averaging:
BF1 = (x1 + y1 + .... z1) / k
BF2 = (x2 + y2 + .... z2) / k
..
BF13 = (x13 + y13 + .... z13) / k
Now if the product BF1*BF2*....*BF13 is used as an estimate will it be biased ? There are clearly some co-variance terms in there which make it
difficult to compare with the first method.
I found practically it is since it remains around 1.3e18 even after 10^7 random games, first 3 or 4 significant digits already converged.
--------------------------------------------------------------------
Then I tried using the geometric mean of BFs at each ply instead. The idea
is since the nodes count is a multiplication of BFs may be this is a better estimate.
BF1 = kth_root(x1*y1*....z1)
BF2 = kth_root(x2*y2*....z2)
...
BF13 = kth_root(x13*y13*....z13)
Then multiplying and rearanging BFs
BF1 * BF2* ... BF13 = kth_root(x1 * x2 *... x13) * kth_root(y1 * y2 * ... y13) *...* kth_root(z1 * z2 * .... z13)
This gives the geometric mean of the nodes count of the k samples. This is consistently lower than the arithmetic mean of nodes count used in the first case. Despite my expectation, it gave around 1e18 significantly lower than the correct value. But this may be unbiased estimate of the geometric mean of tree size. I know this is not useful though.
-------------------------------------------------------------------------
The method you are using now must fall under those methods which estimate perft by taking arithmetic mean of estimated tree sizes, since it gives very good approximations. May be you can estimate the BFs at each ply from the number of legal games that reached that ply, but it is clear it doesn't estimate BFs directly.
Edit:
I estimated perft of your example using the arithmetic and geometric averages.
arithmetic:
ply 1: (2 * 11) / 11 = 2
ply 2: (2 * 3 + 3 * 8) / 11 = 30 / 11
ply 3: (1 * 1 + 2 * 2 + 3 * 3 + 1 * 1 + 4 * 4) / 11 = 31/11
perft(3) = 15.37
geometric:
ply 1: (2 ^ 11) ^ (1/11) = 2
ply 2: (2^3 * 3^8) ^ (1/11) = 2.686
ply 3: (1^1 + 2^2 * 3^3 * 1^1 * 4^4) ^ (1/11) = 2.5339
perft(3) = 13.6