Perft(13) betting pool

Daniel Shawul · Post by **Daniel Shawul** » Mon Aug 15, 2011 3:32 pm

@Steven,
Did you compute perft(12) by breaking them down to perft(8)? Can I have the perft(8) results if that is the case? I want to use them for my perft estimations.

Sven · Post by **Sven** » Mon Aug 15, 2011 4:08 pm

Sven Schüle wrote:
Sven Schüle wrote:
Michel wrote:GNU Chess 5.07.172b http://emis.uhasselt.be/cgi-bin/gitweb. ... ;a=summary Peter1 method with fw=4, 501e6 nodes.

sdn=sd*sqrt(nodes) is the normalized standard deviation

mean=1.540774e+14 sd=8.298100e+09 [ 1.540564e+14 <= Perft(10) <= 1.540985e+14 (99% conf.) ] nodes=501262105 sdn=1.857852e+14 time=1931.75s
Hi Michel,

in the latest snapshot of GNU Chess I see this code in cmd.c (function cmd_perftmc) for incremental update of mean + variance:
Code: Select all
n++;
delta=perft-mean;
mean=mean+delta/n;
M2+=delta*&#40;perft-mean&#41;;
variance=M2/(&#40;n-1&#41;*n&#41;;
While I perfectly agree to the mean calculation I am not sure about the variance, could you please explain how that works? One point is that I don't know how you get around the division by 0 in the "variance=..." line for the first sample, i.e. when "n++" sets n to 1. But I am also asking since I get different (significantly higher) values for the same "Peter1" method in my program. I thought that the variance of a set X of N samples were defined by a formula like this (note the division by N-1, not N):
Code: Select all
variance&#40;X, N&#41; &#58;= sum&#40;i=1..N&#41; (&#40;X&#40;i&#41; - mean&#40;X, N&#41;)^2&#41; / &#40;N-1&#41;
and therefore I use the following code for incremental update of mean and variance (translated to GNU Chess variable names):
Code: Select all
mean = &#40;n * mean + perft&#41; / &#40;n + 1&#41;;
variance = &#40;n == 0&#41; ? 0.0 &#58; (&#40;n - 1&#41; * variance + pow&#40;perft - mean, 2&#41;) / n;
++n;
I have been unable so far to detect an equivalence between your implementation and mine. So may I ask you what you think is right, and what is wrong?

Sven
Hmm, it seems it is close to impossible to detect that equivalence since my implementation is wrong I ignored the fact that variance(n) refers to mean(n) but variance(n+1) must refer completely to mean(n+1). Therefore it does not work the way I did it.

I also have an idea why your code will not crash with "division by 0". It is due to the first value of M2 being 0, and the compiler might optimize 0/0 simply into 0 (which is mathematically incorrect, though).

I am still struggling to understand why your formula works ...

Sven

I think the line for the variance must be like this:

Code: Select all

variance=M2/&#40;n-1&#41;;

Can you confirm this? If you can then I think that all variance resp. SD values you have published so far would have been wrong (too low) if they were based on that implementation.

Your algorithm looks very much like the one under "On-line algorithm" in Wikipedia which refers to D. Knuth and Welford.

With the above correction I get almost identical results for your implementation and for the "perfect" variance that I get if I store all sample values in a vector and calculate mean and variance once in the end.

Sven

Michel · Post by **Michel** » Mon Aug 15, 2011 4:31 pm

variance=M2/(n-1);

This would be the variance for a single sample (that is an estimator for the
variance of the probability distribution of a sample). To get the variance for the mean we have to make an extra division by n.

You are right about the division by zero. I thought that floating point divisions by zero were always silently ignored. But apparently this is not guaranteed. So I put the test for n>1 a bit earlier in the code to avoid this.

And yes the code comes from Wikipedia!

sje · Post by **sje** » Mon Aug 15, 2011 4:32 pm

Daniel Shawul wrote:@Steven,
Did you compute perft(12) by breaking them down to perft(8)? Can I have the perft(8) results if that is the case? I want to use them for my perft estimations.

Alas, I tossed the draft 8 records for that run; there were over 70 thousand of them. I do have all of the draft 10 and 11 records, though.

Sven · Post by **Sven** » Mon Aug 15, 2011 5:12 pm

Michel wrote:
variance=M2/(n-1);
This would be the variance for a single sample (that is an estimator for the
variance of the probability distribution of a sample). To get the variance for the mean we have to make an extra division by n.

You are right about the division by zero. I thought that floating point divisions by zero were always silently ignored. But apparently this is not guaranteed. So I put the test for n>1 a bit earlier in the code to avoid this.

And yes the code comes from Wikipedia!

I did not find your version of that algorithm in Wikipedia, can you give me a link?

I also did not find a source that explains that "variance for the mean" := "variance for a single sample" / "number of samples". I would like to understand it, and also why we need the "variance for the mean" in our case, so could you please give me also a link for that part?

Sven

Michel · Post by **Michel** » Mon Aug 15, 2011 5:29 pm

I did not find your version of that algorithm in Wikipedia, can you give me a link?

http://en.wikipedia.org/wiki/Algorithms ... g_variance

why we need the "variance for the mean"

The mean of a sample is an unbiased estimator for the mean (also called
expectation value) of a probability distribution. This is basically

E((X1+...+Xn)/n)=(E(X1)+...+E(Xn))/n=(n*E(X1))/n=E(X1)

To know how good an estimator it is we have to compute its variance.
To this end we use the following rules:

Var(Z/n)=Var(Z)/n^2

and if U V are independent

Var(U+V)=Var(U)+Var(V)

With these rules we can compute the variance of the mean of a sample

Var((X1+...+Xn)/n)=Var(X1+....+Xn)/n^2=n*Var(X1)/n^2=Var(X1)/n.

EDIT:

To clear up some possible confusion. In prior discussions the term "sample"
has sometimes been used for a single measurement. This usage
is correct in digital signal processing but not in statistics where
a sample means a set of measurements. For this reason Peter refers
to a single run of his algorithm as a "cycle".

EDIT2:

Another possible confusion is the formula

Var((X1+...+Xn)/n)=Var(X1)/n.

We don't know Var(X1) but we may estimate it by taking the variance
of the same sample. This involves a division by n-1. So in total we
divide by n*(n-1).

Sven · Post by **Sven** » Mon Aug 15, 2011 5:51 pm

Michel wrote:
I did not find your version of that algorithm in Wikipedia, can you give me a link?
http://en.wikipedia.org/wiki/Algorithms ... g_variance

why we need the "variance for the mean"
The mean of a sample is an unbiased estimator for the mean (also called
expectation value) of a probability distribution. This is basically

E((X1+...+Xn)/n)=(E(X1)+...+E(Xn))/n=(n*E(X1))/n=E(X1)

To know how good an estimator it is we have to compute its variance.
To this end we use the following rules:

Var(Z/n)=Var(Z)/n^2

and if U V are independent

Var(U+V)=Var(U)+Var(V)

With these rules we can compute the variance of the mean of a sample

Var((X1+...+Xn)/n)=Var(X1+....+Xn)/n^2=n*Var(X1)/n^2=Var(X1)/n.

EDIT:

To clear up some possible confusion. In prior discussions the term "sample"
has sometimes been used for a single measurement. This usage
is correct in digital signal processing but not in statistics where
a sample means a set of measurements. For this reason Peter refers
to a single run of his algorithm as a "cycle".

EDIT2:

Another possible confusion is the formula

Var((X1+...+Xn)/n)=Var(X1)/n.

We don't know Var(X1) but we may estimate it by taking the variance
of the same sample. This involves a division by n-1. So in total we
divide by n*(n-1).

Thanks Michel, now I understand. But the Wikipedia link you gave is the same from which I quoted some posts above, and there the algorithm only divides by (n-1), not by (n-1)*n. However, after your explanation it is roughly clear for me where the additional division by n comes from.

Sven

petero2 · Post by **petero2** » Mon Aug 15, 2011 7:29 pm

Daniel Shawul wrote:I think the reason for the sudden jump in performance of full_width +MC is because up to now for depth <= 3, it was same as _uniform_ allocation. But for depth = 5 and more it becomes proportioning by sub-tree nodes count up to leaf nodes. It was a sub-tree count proportioning all along but we didn't notice it because sub-trees are uniform for depth<=3. So with that in mind it can be beaten f.i if I do full-width expansion to depth=5. Depth=6 would require too much memory though , which is a disadvantage for any breadth first searcher. Having said that it can perform bad for a selective tree where the shape of the upper portion of the tree is not related with the sub-tree size. Another idea applicable for the perft(13) competition would be to take the exact perft(12) results for 2-plies from the root and use that to determine the proportions for the monte-carlo simulations. The depth-first search does a static allocation, so a static allocation based on the real sub-tree size may perform better.

I agree with this. In retrospect, it would have been better to measure the effect of increased full-width search in the HGM and Peter1 methods before spending too much time on the UCT/weighted sampling methods.

petero2 · Post by **petero2** » Mon Aug 15, 2011 8:13 pm

sje wrote:The perft(13) run has been going for just over 12 days and has produced 210 draft ten records. Here are the most recent ten of these:

Code: Select all

rnbqkbnr/1ppppppp/8/p7/1P6/2P5/P2PPPPP/RNBQKBNR b KQkq - 0 2 10 154068837107809
rnbqkbnr/1ppppppp/8/p7/2P5/7N/PP1PPPPP/RNBQKB1R b KQkq - 0 2 10 126452584569476
rnbqkbnr/1ppppppp/8/p7/8/1P1P4/P1P1PPPP/RNBQKBNR b KQkq - 0 2 10 197935747435700
rnbqkbnr/1ppppppp/8/p7/8/P2P4/1PP1PPPP/RNBQKBNR b KQkq - 0 2 10 163777451315240
rnbqkbnr/1ppppppp/8/p7/2P5/5N2/PP1PPPPP/RNBQKB1R b KQkq - 0 2 10 161971799868266
rnbqkbnr/1ppppppp/8/p7/P7/3P4/1PP1PPPP/RNBQKBNR b KQkq - 0 2 10 166336655168294
rnbqkbnr/1ppppppp/8/p7/8/N3P3/PPPP1PPP/R1BQKBNR b KQkq - 0 2 10 290223899535804
rnbqkbnr/1ppppppp/8/p7/1PP5/8/P2PPPPP/RNBQKBNR b KQkq - 0 2 10 185603264146369
rnbqkbnr/1ppppppp/8/p7/8/3P3N/PPP1PPPP/RNBQKB1R b KQkq - 0 2 10 181649457773953
rnbqkbnr/1ppppppp/8/p7/8/2N1P3/PPPP1PPP/R1BQKBNR b KQkq - 0 2 10 406608149434105

How well do your estimation algorithms perform on the above?

sje wrote:The average time for calculating a draft 10 record in the current run is about 5,013 seconds. It would be interesting to see the quality of estimations that can be computed in, say, one percent of the actual full calculation time.

Using "Peter1" with fw=4:

Code: Select all

pos               m     relerr  movGens cycles      t
  1 154093378234301  1.593e-04 31099978     17 53.073
  2 126467517170116  1.181e-04 33530832     21 50.747
  3 197915769847456 -1.009e-04 34568046     15 52.250
  4 163820486561991  2.628e-04 34776633     17 52.521
  5 161965379089192 -3.964e-05 35094578     19 51.665
  6 166368914541236  1.939e-04 34560150     17 52.240
  7 290318396860821  3.256e-04 32154093     12 52.511
  8 185597417904519 -3.150e-05 33747973     17 52.674
  9 181579829695344 -3.833e-04 34778534     16 52.431
 10 406543544328447 -1.589e-04 31748598     10 52.491

There are too few cycles to compute accurate estimates of the standard deviation, so therefore I just report the actual relative errors instead.

ethanara · Post by **ethanara** » Mon Aug 15, 2011 8:31 pm

Here is a idea to create upper limit:
Make an engine only using AB , and then search to depth 13, call it x
Then because AB can prune down to sqrt(n) , where n is the number of nodes searched in minimax
So , the upper limit is x*x !!
Now, i only need to find lower limit, then i will place my bet.
I dont know if this idea was written before, i didnt look through the 51 pages

Perft(13) betting pool

Re: A few more records

Re: A few more records

Re: A few more records

Re: A few more records

Re: A few more records

Re: A few more records

Re: A few more records

Re: Summary of Monte Carlo results for perft(12) so far

Re: A few more records

Re: Summary of Monte Carlo results for perft(12) so far