Perft(15) estimates thread

sje · Post by **sje** » Wed Apr 10, 2013 6:46 am

This is the perft(15) estimates thread.

Mine: 2e+21

wgarvin · Post by **wgarvin** » Thu Apr 11, 2013 3:17 am

I fully expect to come back to this forum in a handful of years and discover that you guys have actually calculated perft(15).

It will probably have taken 8 months using a bunch of then-state-of-the-art monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 11, 2013 5:52 am

I have a perft estimator runnable on cluster now. This perft estimation stuff has proven so useful as a playground for UCT search. So I am planning to write a summary of monte carlo methods perft estimation methods based on discussion we had couple of years ago. Maybe something good enough for chess wiki or just a formal document.

* The sampling distribution is log-normal, derived from the fact that perft estimate is a product of branching factors.
* Non uniform sampling of moves is better than uniform sampling. This fact becomes clear when the simulations are started from a larger depth say 3. Each of the 20 root moves will be simulated different number of times naturally by a depth-first searcher, and by proportioning by the sub-tree size (number of leaf nodes to be exact) when using best-first searcher. For the latter the proportioning of simulations is not so clear. I have about 5 formulas that i tested. I forgot the specific motive for each formula but in general proportioning by some function of the tree size (assumed proportional to variance). Which one is the best is still unsettled. See below for more.
* Combining depth-first and best-first estimators proved to be better. Also adding a selective layer helped so.
Code: Select all
```
   Dynamic tree (different depth) + K full depth + L selective depth + Monte-carlo (exactly 1 move)
```
* Improvements for the Monte-carlo part include
- - proportioning by heuristics (captures etc.)
  - 'Anthithetic' variables for variance reduction
  - Hashing
* Guiding the perft estimation from one estimator with another estimation that is parallelly updated but less frequently. That is the method i use.

Let me know if I forgot anything.
-----------------------------------------------------------------
I have some doubts about proportioning. Let me start from Michel's formula.

if one wants to estimate x_1+...+x_n and
one has unbiased estimators X_i for x_i with variance sigma_i^2 then
if I calculated correctly one should choose X_i with probability

p_i=mu sqrt(x_i^2+sigma_i^2)

where mu should be adjusted such that the sum of the p_i is 1.

I do not remember why p_i depended on the mean value x_i, but thinking about it now from scratch, I get p_i = mu sqrt(sigma_i^2). The problem is that of variance reduction. Say the original variances are Vi=sigma_i^2. Now given N simulations N_i for each, how to proportion to arrive at the minimum variance? With Ni simulations variance will be reduced to Vi/Ni so:

Code: Select all

Objective:  Minimize(V1/N1+V2/N2+...Vn/Nn)
Constraint: N = N1+N2+...+Nn

For two variables , I can explicitly derive that Ni should be proportional to sqrt(V1). For more than three variables, I had problems to derive the same thing, but some clever mathematician showed me how to do it using lagrange multiplier.
Indeed the proof becomes simple that way. Anyway is the formula with the mean used only the monte-carlo part where we don't save the means? I think for in-tree updates it may not be necessary. Ofcourse the variance may be proportional to tree size indirectly so it has effect the mean has in Michel's formula indirectly.

Daniel

lucasart · Post by **lucasart** » Thu Apr 11, 2013 3:28 pm

wgarvin wrote:I fully expect to come back to this forum in a handful of years and discover that you guys have actually calculated perft(15).

It will probably have taken 8 months using a bunch of then-state-of-the-art monster machines with 512GB of RAM, 256 beefy cores, a petabyte offline storage array, ...

What a waste of time and electricity. Think about the environment

Perft is only a unit test. Nothing interesting in itself.

Daniel Shawul · Post by **Daniel Shawul** » Thu Apr 11, 2013 7:59 pm

I need this paper from ICGA (the international chewing gam association

):
"Computing Deep Perft and Divide Numbers for Checkers", Art Bik

@Steven, Do you have a some kind of article on your work that I can cite?

Thanks

Daniel Shawul · Post by **Daniel Shawul** » Fri Apr 12, 2013 2:31 am

Alright i have started writing something now with some introduction and some descritpion of hocus-pocus methods. This is basically a summary of what i can find on the net and hopefully more interesting stuff follows as we go on. You can follow the progress at this public drop box link. I will work through it at a convenient pace while doing some tests in parallel. Let me know if you have comments. I know my writing skills suck so go easy on that

Daniel Shawul · Post by **Daniel Shawul** » Fri Apr 12, 2013 3:24 pm

For two variables , I can explicitly derive that Ni should be proportional to sqrt(V1). For more than three variables, I had problems to derive the same thing, but some clever mathematician showed me how to do it using lagrange multiplier.

Wow, It is hard to digest a 2 year old 58 pages material, but i finally spotted Michel mentioned the proof by Lagrange multiplier here at page 55!.
Still trying to find out why mean is included in the formula sqrt(mean^2+sigma^2) but i suspect it applies only to the MC part where statistics is not kept. I wouldn't know because i never used it.
@Michel, if you see this please do the proofs for both so that i can include it verbatim

Daniel Shawul · Post by **Daniel Shawul** » Fri Apr 12, 2013 8:58 pm

I think i have hit at something Gold. It turns out that Michel's best proportioning formula and my best selection algorithm selection algorithm are the same!! At first look they look different but I can prove that they lead to the same proportioning scheme. The proof is not very formal but I will find out a formal mathematical proof of equivalence later. I need to dig deeper to convergence of infinite series to prove it...

----------------------------
Lets say we are given N simulations to spend on k moves at the root. The variances for sub-tree size under each move are \sigma_i^2. The question Michel tried to answer is what should be the final proportion delivered to each move so that the variance of the perft estimate will be minumum. As Michel have already done so , and as i have confirmed now, it is possible to prove using Lagrange multipliers that proportioning by standard deviation is the best.

Code: Select all

Objective:  Minimize(V1/N1+V2/N2+...Vn/Nn) 
Constraint: N = N1+N2+...+Nn

Code: Select all

Answer: Proportion by \sigma_i

-----------------------------
The question that I tried to answer is which move to pick to simulate given number variances and completed number of simulations for each move. Note that my approach is a selection algorithm similar to UCB formula. As I have stated 2 years ago, pick the move which gives the highest variance reduction. Now when you think about this it seems a completely different algorithm than Michel's. Here is how i arrived at my formula as posted in that old thread. At the top of the page, you can see that Michel is trying to convince me to use standard deviation proportioning. And towards the bottom of the page I am trying to convince him mine is the best. It will be really funny if this equivalence i claim now is indeed true.

Code: Select all

Original variance = sigma^2 / N 
After one game = sigma^2 / (N + 1) 

Reduction = sigma^2 / N - sigma^2 / (N + 1) 
          = sigma^2 (1 / N - 1 / N + 1) 
          = sigma^2 (1 / N * (N + 1) 
          = (sigma^2 / N) / (N + 1) 
          = Original variance / (N + 1)

The derivation is simple. Pick the move which gives the biggest reduction in variance.

Code: Select all

pick \argmax_i  \frac {\sigma_i^2}{N_i(N_i+1)}

The advantage of this method is that any time you stop the simulations the simulations will have followed more or less the standard deviation proportioning. Michel's proportioning formula, by definition, can not be used for selection as it only predicts the final proportion, and not how it will be achieved.
-------------------------------------------
My 'proof' for now is to just simulate my selection and see to what kind of proportioning it leads to. It turns out it proportions to square of the variance i.e. standard deviation. Here is a matlab code to test the idea:

Code: Select all

sigma = [32 16 8 4 2 1];
n = [1 1 1 1 1 1];
for i=1:100
    maxr=0;
    maxj=1;
    for j=1:6
        t=(sigma(j)^2)/(n(j)*(n(j)+1));
        if(t>maxr)
            maxr=t;
            maxj=j;
        end
    end
    n(maxj)=n(maxj)+1;
    n
end

Here is how n varies

Code: Select all

2	1	1	1	1	1
3	1	1	1	1	1
3	2	1	1	1	1
4	2	1	1	1	1
5	2	1	1	1	1
5	3	1	1	1	1
6	3	1	1	1	1
6	3	2	1	1	1
.....				
13	6	3	2	1	1
13	7	3	2	1	1
14	7	3	2	1	1
...
54    27    13     7     3     2
...
5082        2541        1271         635         318         159

Clearly the proportions follow the population standard devition \sigma_i (doubling in this case). The is my proof for _now_ but i need to formalize it probably with some math guy but i will challenge my self to do it without any help.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Fri Apr 12, 2013 10:10 pm

Ok the proof by inifinite series seems to be really hard since there is a max term in it. Also after a certain number of simulations the proportions will hang around the best proportion so i am not sure if it was correct to think of it an an infinite series. The problem is:

Code: Select all

pick the move with:  \argmax_i  \frac {\sigma_i^2}{N_i(N_i+1)} 
update its count:  n(i) = n(i) + 1
Prove that final proportions of n(i) are:  n(1):n(2):n(3)...n(k) = sigma(i):sigma(2)...sigma(k)

Trying to prove equivalence the straight forward way maybe impossible. So i try to prove it by other means. Here goes:

Reduction of variance is a monotonically decreasing function i.e. making one more simulation will always decrease the variance. The same is true for the total perft that adds the the means and variances. Now if i take the best strategy at each iteration (gready selection of largest reduction) it should lead me to the optimum proportion. For a non-monotonic function this is not true, so it is important that the reduction is monotonic. Monotonic functions have one unique optimal point., there fore the proportion we get following this procedure should be optimum. Then we go back to Michel's proof that proportioning by standard deviation gives the least variance, so it must be the optimum point for my selection method too since the function (variance reduction) is monotonic and hence has one optimal point.

That is my proof. Please let me know if you see a problem in the logic.

Daniel

sje · Post by **sje** » Fri Apr 12, 2013 11:56 pm

Daniel Shawul wrote:@Steven, Do you have a some kind of article on your work that I can cite?

Not really. I started doing movepath enumeration (my name for perft) more than thirty years ago as a part of program verification. My early work and later additions by myself and others appear at http://oeis.org/A048987 (which also lists a truly ancient email address of mine).

Perft(15) estimates thread

Perft(15) estimates thread

Re: Perft(15) estimates thread

MC methods

Re: Perft(15) estimates thread

Re: Perft(15) estimates thread

Re: MC methods

Re: MC methods

Re: MC methods

Re: MC methods

Re: Perft(15) estimates thread