Perft(13) betting pool
Moderators: hgm, Harvey Williamson, bob

 Posts: 3429
 Joined: Tue Mar 14, 2006 10:34 am
 Location: Ethiopia
 Contact:
Re: A few more records
@Steven,
Did you compute perft(12) by breaking them down to perft(8)? Can I have the perft(8) results if that is the case? I want to use them for my perft estimations.
Did you compute perft(12) by breaking them down to perft(8)? Can I have the perft(8) results if that is the case? I want to use them for my perft estimations.
Re: A few more records
I think the line for the variance must be like this:Sven Schüle wrote:Hmm, it seems it is close to impossible to detect that equivalence since my implementation is wrong I ignored the fact that variance(n) refers to mean(n) but variance(n+1) must refer completely to mean(n+1). Therefore it does not work the way I did it.Sven Schüle wrote:Hi Michel,Michel wrote:GNU Chess 5.07.172b http://emis.uhasselt.be/cgibin/gitweb. ... ;a=summary Peter1 method with fw=4, 501e6 nodes.
sdn=sd*sqrt(nodes) is the normalized standard deviation
mean=1.540774e+14 sd=8.298100e+09 [ 1.540564e+14 <= Perft(10) <= 1.540985e+14 (99% conf.) ] nodes=501262105 sdn=1.857852e+14 time=1931.75s
in the latest snapshot of GNU Chess I see this code in cmd.c (function cmd_perftmc) for incremental update of mean + variance:While I perfectly agree to the mean calculation I am not sure about the variance, could you please explain how that works? One point is that I don't know how you get around the division by 0 in the "variance=..." line for the first sample, i.e. when "n++" sets n to 1. But I am also asking since I get different (significantly higher) values for the same "Peter1" method in my program. I thought that the variance of a set X of N samples were defined by a formula like this (note the division by N1, not N):Code: Select all
n++; delta=perftmean; mean=mean+delta/n; M2+=delta*(perftmean); variance=M2/((n1)*n);
and therefore I use the following code for incremental update of mean and variance (translated to GNU Chess variable names):Code: Select all
variance(X, N) := sum(i=1..N) ((X(i)  mean(X, N))^2) / (N1)
I have been unable so far to detect an equivalence between your implementation and mine. So may I ask you what you think is right, and what is wrong?Code: Select all
mean = (n * mean + perft) / (n + 1); variance = (n == 0) ? 0.0 : ((n  1) * variance + pow(perft  mean, 2)) / n; ++n;
Sven
I also have an idea why your code will not crash with "division by 0". It is due to the first value of M2 being 0, and the compiler might optimize 0/0 simply into 0 (which is mathematically incorrect, though).
I am still struggling to understand why your formula works ...
Sven
Code: Select all
variance=M2/(n1);
Your algorithm looks very much like the one under "Online algorithm" in Wikipedia which refers to D. Knuth and Welford.
With the above correction I get almost identical results for your implementation and for the "perfect" variance that I get if I store all sample values in a vector and calculate mean and variance once in the end.
Sven
Re: A few more records
This would be the variance for a single sample (that is an estimator for thevariance=M2/(n1);
variance of the probability distribution of a sample). To get the variance for the mean we have to make an extra division by n.
You are right about the division by zero. I thought that floating point divisions by zero were always silently ignored. But apparently this is not guaranteed. So I put the test for n>1 a bit earlier in the code to avoid this.
And yes the code comes from Wikipedia!
Re: A few more records
Alas, I tossed the draft 8 records for that run; there were over 70 thousand of them. I do have all of the draft 10 and 11 records, though.Daniel Shawul wrote:@Steven,
Did you compute perft(12) by breaking them down to perft(8)? Can I have the perft(8) results if that is the case? I want to use them for my perft estimations.
Re: A few more records
I did not find your version of that algorithm in Wikipedia, can you give me a link?Michel wrote:This would be the variance for a single sample (that is an estimator for thevariance=M2/(n1);
variance of the probability distribution of a sample). To get the variance for the mean we have to make an extra division by n.
You are right about the division by zero. I thought that floating point divisions by zero were always silently ignored. But apparently this is not guaranteed. So I put the test for n>1 a bit earlier in the code to avoid this.
And yes the code comes from Wikipedia!
I also did not find a source that explains that "variance for the mean" := "variance for a single sample" / "number of samples". I would like to understand it, and also why we need the "variance for the mean" in our case, so could you please give me also a link for that part?
Sven
Re: A few more records
http://en.wikipedia.org/wiki/Algorithms ... g_varianceI did not find your version of that algorithm in Wikipedia, can you give me a link?
The mean of a sample is an unbiased estimator for the mean (also calledwhy we need the "variance for the mean"
expectation value) of a probability distribution. This is basically
E((X1+...+Xn)/n)=(E(X1)+...+E(Xn))/n=(n*E(X1))/n=E(X1)
To know how good an estimator it is we have to compute its variance.
To this end we use the following rules:
Var(Z/n)=Var(Z)/n^2
and if U V are independent
Var(U+V)=Var(U)+Var(V)
With these rules we can compute the variance of the mean of a sample
Var((X1+...+Xn)/n)=Var(X1+....+Xn)/n^2=n*Var(X1)/n^2=Var(X1)/n.
EDIT:
To clear up some possible confusion. In prior discussions the term "sample"
has sometimes been used for a single measurement. This usage
is correct in digital signal processing but not in statistics where
a sample means a set of measurements. For this reason Peter refers
to a single run of his algorithm as a "cycle".
EDIT2:
Another possible confusion is the formula
Var((X1+...+Xn)/n)=Var(X1)/n.
We don't know Var(X1) but we may estimate it by taking the variance
of the same sample. This involves a division by n1. So in total we
divide by n*(n1).
Re: A few more records
Thanks Michel, now I understand. But the Wikipedia link you gave is the same from which I quoted some posts above, and there the algorithm only divides by (n1), not by (n1)*n. However, after your explanation it is roughly clear for me where the additional division by n comes from.Michel wrote:http://en.wikipedia.org/wiki/Algorithms ... g_varianceI did not find your version of that algorithm in Wikipedia, can you give me a link?
The mean of a sample is an unbiased estimator for the mean (also calledwhy we need the "variance for the mean"
expectation value) of a probability distribution. This is basically
E((X1+...+Xn)/n)=(E(X1)+...+E(Xn))/n=(n*E(X1))/n=E(X1)
To know how good an estimator it is we have to compute its variance.
To this end we use the following rules:
Var(Z/n)=Var(Z)/n^2
and if U V are independent
Var(U+V)=Var(U)+Var(V)
With these rules we can compute the variance of the mean of a sample
Var((X1+...+Xn)/n)=Var(X1+....+Xn)/n^2=n*Var(X1)/n^2=Var(X1)/n.
EDIT:
To clear up some possible confusion. In prior discussions the term "sample"
has sometimes been used for a single measurement. This usage
is correct in digital signal processing but not in statistics where
a sample means a set of measurements. For this reason Peter refers
to a single run of his algorithm as a "cycle".
EDIT2:
Another possible confusion is the formula
Var((X1+...+Xn)/n)=Var(X1)/n.
We don't know Var(X1) but we may estimate it by taking the variance
of the same sample. This involves a division by n1. So in total we
divide by n*(n1).
Sven
Re: Summary of Monte Carlo results for perft(12) so far
I agree with this. In retrospect, it would have been better to measure the effect of increased fullwidth search in the HGM and Peter1 methods before spending too much time on the UCT/weighted sampling methods.Daniel Shawul wrote:I think the reason for the sudden jump in performance of full_width +MC is because up to now for depth <= 3, it was same as _uniform_ allocation. But for depth = 5 and more it becomes proportioning by subtree nodes count up to leaf nodes. It was a subtree count proportioning all along but we didn't notice it because subtrees are uniform for depth<=3. So with that in mind it can be beaten f.i if I do fullwidth expansion to depth=5. Depth=6 would require too much memory though , which is a disadvantage for any breadth first searcher. Having said that it can perform bad for a selective tree where the shape of the upper portion of the tree is not related with the subtree size. Another idea applicable for the perft(13) competition would be to take the exact perft(12) results for 2plies from the root and use that to determine the proportions for the montecarlo simulations. The depthfirst search does a static allocation, so a static allocation based on the real subtree size may perform better.
Re: A few more records
sje wrote:The perft(13) run has been going for just over 12 days and has produced 210 draft ten records. Here are the most recent ten of these:How well do your estimation algorithms perform on the above?Code: Select all
rnbqkbnr/1ppppppp/8/p7/1P6/2P5/P2PPPPP/RNBQKBNR b KQkq  0 2 10 154068837107809 rnbqkbnr/1ppppppp/8/p7/2P5/7N/PP1PPPPP/RNBQKB1R b KQkq  0 2 10 126452584569476 rnbqkbnr/1ppppppp/8/p7/8/1P1P4/P1P1PPPP/RNBQKBNR b KQkq  0 2 10 197935747435700 rnbqkbnr/1ppppppp/8/p7/8/P2P4/1PP1PPPP/RNBQKBNR b KQkq  0 2 10 163777451315240 rnbqkbnr/1ppppppp/8/p7/2P5/5N2/PP1PPPPP/RNBQKB1R b KQkq  0 2 10 161971799868266 rnbqkbnr/1ppppppp/8/p7/P7/3P4/1PP1PPPP/RNBQKBNR b KQkq  0 2 10 166336655168294 rnbqkbnr/1ppppppp/8/p7/8/N3P3/PPPP1PPP/R1BQKBNR b KQkq  0 2 10 290223899535804 rnbqkbnr/1ppppppp/8/p7/1PP5/8/P2PPPPP/RNBQKBNR b KQkq  0 2 10 185603264146369 rnbqkbnr/1ppppppp/8/p7/8/3P3N/PPP1PPPP/RNBQKB1R b KQkq  0 2 10 181649457773953 rnbqkbnr/1ppppppp/8/p7/8/2N1P3/PPPP1PPP/R1BQKBNR b KQkq  0 2 10 406608149434105
Using "Peter1" with fw=4:sje wrote:The average time for calculating a draft 10 record in the current run is about 5,013 seconds. It would be interesting to see the quality of estimations that can be computed in, say, one percent of the actual full calculation time.
Code: Select all
pos m relerr movGens cycles t
1 154093378234301 1.593e04 31099978 17 53.073
2 126467517170116 1.181e04 33530832 21 50.747
3 197915769847456 1.009e04 34568046 15 52.250
4 163820486561991 2.628e04 34776633 17 52.521
5 161965379089192 3.964e05 35094578 19 51.665
6 166368914541236 1.939e04 34560150 17 52.240
7 290318396860821 3.256e04 32154093 12 52.511
8 185597417904519 3.150e05 33747973 17 52.674
9 181579829695344 3.833e04 34778534 16 52.431
10 406543544328447 1.589e04 31748598 10 52.491
Re: Summary of Monte Carlo results for perft(12) so far
Here is a idea to create upper limit:
Make an engine only using AB , and then search to depth 13, call it x
Then because AB can prune down to sqrt(n) , where n is the number of nodes searched in minimax
So , the upper limit is x*x !!
Now, i only need to find lower limit, then i will place my bet.
I dont know if this idea was written before, i didnt look through the 51 pages
Make an engine only using AB , and then search to depth 13, call it x
Then because AB can prune down to sqrt(n) , where n is the number of nodes searched in minimax
So , the upper limit is x*x !!
Now, i only need to find lower limit, then i will place my bet.
I dont know if this idea was written before, i didnt look through the 51 pages