Perft(15) estimates thread
Moderators: hgm, chrisw, Rebel
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Perft(15) estimates thread
This is the perft(15) estimates thread.
Mine: 2e+21
Mine: 2e+21
-
- Posts: 838
- Joined: Thu Jul 05, 2007 5:03 pm
- Location: British Columbia, Canada
Re: Perft(15) estimates thread
I fully expect to come back to this forum in a handful of years and discover that you guys have actually calculated perft(15).
It will probably have taken 8 months using a bunch of then-state-of-the-art monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...
It will probably have taken 8 months using a bunch of then-state-of-the-art monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
MC methods
I have a perft estimator runnable on cluster now. This perft estimation stuff has proven so useful as a playground for UCT search. So I am planning to write a summary of monte carlo methods perft estimation methods based on discussion we had couple of years ago. Maybe something good enough for chess wiki or just a formal document.
-----------------------------------------------------------------
I have some doubts about proportioning. Let me start from Michel's formula.
For two variables , I can explicitly derive that Ni should be proportional to sqrt(V1). For more than three variables, I had problems to derive the same thing, but some clever mathematician showed me how to do it using lagrange multiplier.
Indeed the proof becomes simple that way. Anyway is the formula with the mean used only the monte-carlo part where we don't save the means? I think for in-tree updates it may not be necessary. Ofcourse the variance may be proportional to tree size indirectly so it has effect the mean has in Michel's formula indirectly.
Daniel
- * The sampling distribution is log-normal, derived from the fact that perft estimate is a product of branching factors.
* Non uniform sampling of moves is better than uniform sampling. This fact becomes clear when the simulations are started from a larger depth say 3. Each of the 20 root moves will be simulated different number of times naturally by a depth-first searcher, and by proportioning by the sub-tree size (number of leaf nodes to be exact) when using best-first searcher. For the latter the proportioning of simulations is not so clear. I have about 5 formulas that i tested. I forgot the specific motive for each formula but in general proportioning by some function of the tree size (assumed proportional to variance). Which one is the best is still unsettled. See below for more.
* Combining depth-first and best-first estimators proved to be better. Also adding a selective layer helped so.* Improvements for the Monte-carlo part includeCode: Select all
Dynamic tree (different depth) + K full depth + L selective depth + Monte-carlo (exactly 1 move)
- - proportioning by heuristics (captures etc.)
- 'Anthithetic' variables for variance reduction
- Hashing
- - proportioning by heuristics (captures etc.)
-----------------------------------------------------------------
I have some doubts about proportioning. Let me start from Michel's formula.
I do not remember why p_i depended on the mean value x_i, but thinking about it now from scratch, I get p_i = mu sqrt(sigma_i^2). The problem is that of variance reduction. Say the original variances are Vi=sigma_i^2. Now given N simulations N_i for each, how to proportion to arrive at the minimum variance? With Ni simulations variance will be reduced to Vi/Ni so:if one wants to estimate x_1+...+x_n and
one has unbiased estimators X_i for x_i with variance sigma_i^2 then
if I calculated correctly one should choose X_i with probability
p_i=mu sqrt(x_i^2+sigma_i^2)
where mu should be adjusted such that the sum of the p_i is 1.
Code: Select all
Objective: Minimize(V1/N1+V2/N2+...Vn/Nn)
Constraint: N = N1+N2+...+Nn
Indeed the proof becomes simple that way. Anyway is the formula with the mean used only the monte-carlo part where we don't save the means? I think for in-tree updates it may not be necessary. Ofcourse the variance may be proportional to tree size indirectly so it has effect the mean has in Michel's formula indirectly.
Daniel
-
- Posts: 3238
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: Perft(15) estimates thread
What a waste of time and electricity. Think about the environmentwgarvin wrote:I fully expect to come back to this forum in a handful of years and discover that you guys have actually calculated perft(15).
It will probably have taken 8 months using a bunch of then-state-of-the-art monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...
Perft is only a unit test. Nothing interesting in itself.
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: Perft(15) estimates thread
I need this paper from ICGA (the international chewing gam association ):
"Computing Deep Perft and Divide Numbers for Checkers", Art Bik
@Steven, Do you have a some kind of article on your work that I can cite?
Thanks
"Computing Deep Perft and Divide Numbers for Checkers", Art Bik
@Steven, Do you have a some kind of article on your work that I can cite?
Thanks
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MC methods
Alright i have started writing something now with some introduction and some descritpion of hocus-pocus methods. This is basically a summary of what i can find on the net and hopefully more interesting stuff follows as we go on. You can follow the progress at this public drop box link. I will work through it at a convenient pace while doing some tests in parallel. Let me know if you have comments. I know my writing skills suck so go easy on that
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MC methods
Wow, It is hard to digest a 2 year old 58 pages material, but i finally spotted Michel mentioned the proof by Lagrange multiplier here at page 55!.For two variables , I can explicitly derive that Ni should be proportional to sqrt(V1). For more than three variables, I had problems to derive the same thing, but some clever mathematician showed me how to do it using lagrange multiplier.
Still trying to find out why mean is included in the formula sqrt(mean^2+sigma^2) but i suspect it applies only to the MC part where statistics is not kept. I wouldn't know because i never used it.
@Michel, if you see this please do the proofs for both so that i can include it verbatim
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MC methods
I think i have hit at something Gold. It turns out that Michel's best proportioning formula and my best selection algorithm selection algorithm are the same!! At first look they look different but I can prove that they lead to the same proportioning scheme. The proof is not very formal but I will find out a formal mathematical proof of equivalence later. I need to dig deeper to convergence of infinite series to prove it...
----------------------------
Lets say we are given N simulations to spend on k moves at the root. The variances for sub-tree size under each move are \sigma_i^2. The question Michel tried to answer is what should be the final proportion delivered to each move so that the variance of the perft estimate will be minumum. As Michel have already done so , and as i have confirmed now, it is possible to prove using Lagrange multipliers that proportioning by standard deviation is the best.
-----------------------------
The question that I tried to answer is which move to pick to simulate given number variances and completed number of simulations for each move. Note that my approach is a selection algorithm similar to UCB formula. As I have stated 2 years ago, pick the move which gives the highest variance reduction. Now when you think about this it seems a completely different algorithm than Michel's. Here is how i arrived at my formula as posted in that old thread. At the top of the page, you can see that Michel is trying to convince me to use standard deviation proportioning. And towards the bottom of the page I am trying to convince him mine is the best. It will be really funny if this equivalence i claim now is indeed true.
The derivation is simple. Pick the move which gives the biggest reduction in variance.
The advantage of this method is that any time you stop the simulations the simulations will have followed more or less the standard deviation proportioning. Michel's proportioning formula, by definition, can not be used for selection as it only predicts the final proportion, and not how it will be achieved.
-------------------------------------------
My 'proof' for now is to just simulate my selection and see to what kind of proportioning it leads to. It turns out it proportions to square of the variance i.e. standard deviation. Here is a matlab code to test the idea:
Here is how n varies
Clearly the proportions follow the population standard devition \sigma_i (doubling in this case). The is my proof for _now_ but i need to formalize it probably with some math guy but i will challenge my self to do it without any help.
Daniel
----------------------------
Lets say we are given N simulations to spend on k moves at the root. The variances for sub-tree size under each move are \sigma_i^2. The question Michel tried to answer is what should be the final proportion delivered to each move so that the variance of the perft estimate will be minumum. As Michel have already done so , and as i have confirmed now, it is possible to prove using Lagrange multipliers that proportioning by standard deviation is the best.
Code: Select all
Objective: Minimize(V1/N1+V2/N2+...Vn/Nn)
Constraint: N = N1+N2+...+Nn
Code: Select all
Answer: Proportion by \sigma_i
The question that I tried to answer is which move to pick to simulate given number variances and completed number of simulations for each move. Note that my approach is a selection algorithm similar to UCB formula. As I have stated 2 years ago, pick the move which gives the highest variance reduction. Now when you think about this it seems a completely different algorithm than Michel's. Here is how i arrived at my formula as posted in that old thread. At the top of the page, you can see that Michel is trying to convince me to use standard deviation proportioning. And towards the bottom of the page I am trying to convince him mine is the best. It will be really funny if this equivalence i claim now is indeed true.
Code: Select all
Original variance = sigma^2 / N
After one game = sigma^2 / (N + 1)
Reduction = sigma^2 / N - sigma^2 / (N + 1)
= sigma^2 (1 / N - 1 / N + 1)
= sigma^2 (1 / N * (N + 1)
= (sigma^2 / N) / (N + 1)
= Original variance / (N + 1)
Code: Select all
pick \argmax_i \frac {\sigma_i^2}{N_i(N_i+1)}
-------------------------------------------
My 'proof' for now is to just simulate my selection and see to what kind of proportioning it leads to. It turns out it proportions to square of the variance i.e. standard deviation. Here is a matlab code to test the idea:
Code: Select all
sigma = [32 16 8 4 2 1];
n = [1 1 1 1 1 1];
for i=1:100
maxr=0;
maxj=1;
for j=1:6
t=(sigma(j)^2)/(n(j)*(n(j)+1));
if(t>maxr)
maxr=t;
maxj=j;
end
end
n(maxj)=n(maxj)+1;
n
end
Code: Select all
2 1 1 1 1 1
3 1 1 1 1 1
3 2 1 1 1 1
4 2 1 1 1 1
5 2 1 1 1 1
5 3 1 1 1 1
6 3 1 1 1 1
6 3 2 1 1 1
.....
13 6 3 2 1 1
13 7 3 2 1 1
14 7 3 2 1 1
...
54 27 13 7 3 2
...
5082 2541 1271 635 318 159
Daniel
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: MC methods
Ok the proof by inifinite series seems to be really hard since there is a max term in it. Also after a certain number of simulations the proportions will hang around the best proportion so i am not sure if it was correct to think of it an an infinite series. The problem is:
Trying to prove equivalence the straight forward way maybe impossible. So i try to prove it by other means. Here goes:
Reduction of variance is a monotonically decreasing function i.e. making one more simulation will always decrease the variance. The same is true for the total perft that adds the the means and variances. Now if i take the best strategy at each iteration (gready selection of largest reduction) it should lead me to the optimum proportion. For a non-monotonic function this is not true, so it is important that the reduction is monotonic. Monotonic functions have one unique optimal point., there fore the proportion we get following this procedure should be optimum. Then we go back to Michel's proof that proportioning by standard deviation gives the least variance, so it must be the optimum point for my selection method too since the function (variance reduction) is monotonic and hence has one optimal point.
That is my proof. Please let me know if you see a problem in the logic.
Daniel
Code: Select all
pick the move with: \argmax_i \frac {\sigma_i^2}{N_i(N_i+1)}
update its count: n(i) = n(i) + 1
Prove that final proportions of n(i) are: n(1):n(2):n(3)...n(k) = sigma(i):sigma(2)...sigma(k)
Reduction of variance is a monotonically decreasing function i.e. making one more simulation will always decrease the variance. The same is true for the total perft that adds the the means and variances. Now if i take the best strategy at each iteration (gready selection of largest reduction) it should lead me to the optimum proportion. For a non-monotonic function this is not true, so it is important that the reduction is monotonic. Monotonic functions have one unique optimal point., there fore the proportion we get following this procedure should be optimum. Then we go back to Michel's proof that proportioning by standard deviation gives the least variance, so it must be the optimum point for my selection method too since the function (variance reduction) is monotonic and hence has one optimal point.
That is my proof. Please let me know if you see a problem in the logic.
Daniel
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Perft(15) estimates thread
Not really. I started doing movepath enumeration (my name for perft) more than thirty years ago as a part of program verification. My early work and later additions by myself and others appear at http://oeis.org/A048987 (which also lists a truly ancient email address of mine).Daniel Shawul wrote:@Steven, Do you have a some kind of article on your work that I can cite?