Perft(13) betting pool

petero2 · Post by **petero2** » Sat Jul 30, 2011 9:25 am

sje wrote:A test point for the Monte Carlo practitioners:

I don't have perft(13) yet, but I've got parts of it.

[D]r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3

For the above ply 4 position, perft(9) is 23,814,128,415,915.

How well do your approximation algorithms do with this?

Using

a Monte Carlo based method as suggested by HGM and Uri,
with my random walk sampling idea http://talkchess.com/forum/viewtopic.ph ... 43&t=39678
with optimal weights as calculated by Michel http://talkchess.com/forum/viewtopic.ph ... 71&t=39678
with my weight estimation idea http://talkchess.com/forum/viewtopic.ph ... 97&t=39678
and a UCT-like tree growing approach as suggested by Daniel,

I got the following just after the tree had grown to its maximum allowed size of 4M nodes.

Code: Select all

23814128415915	    &#40;real value&#41;
23814148433009	    &#40;estimated value&#41;
    2375181106	    &#40;estimated standard deviation&#41;

This was after about 18 million random walks and 21 minutes of CPU time.

The estimate is too good to be typical though, as the error is less than 1% of the standard deviation. Therefore I repeated the measurement and got this:

Code: Select all

23814128415915	    &#40;real value&#41;
23813936012606	    &#40;estimated value&#41;		  
    2372711320	    &#40;estimated standard deviation&#41;

The estimation error this time was about 0.1 standard deviations. Still too low to seem typical, I will have to investigate this more.

I think there is still some more room for improvement, as my current method seems to be learning the optimal sampling weights quite slowly. I have an idea how to make it learn faster, but I'm not sure it will work yet.

sje · Post by **sje** » Sat Jul 30, 2011 9:43 am

sje wrote:A test point for the Monte Carlo practitioners:

I don't have perft(13) yet, but I've got parts of it.
[D]r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3
For the above ply 4 position, perft(9) is 23,814,128,415,915.

How well do your approximation algorithms do with this?

And for another try one ply further, in the above ply 4 position, perft(10) is 804,686,983,671,116.

petero2 · Post by **petero2** » Sat Jul 30, 2011 10:43 am

sje wrote:
sje wrote:A test point for the Monte Carlo practitioners:

I don't have perft(13) yet, but I've got parts of it.
[D]r1bqkbnr/pppp1ppp/2n5/4p3/4P3/5N2/PPPP1PPP/RNBQKB1R w KQkq - 2 3
For the above ply 4 position, perft(9) is 23,814,128,415,915.

How well do your approximation algorithms do with this?
And for another try one ply further, in the above ply 4 position, perft(10) is 804,686,983,671,116.

After 36 million random walks, using the same method as before:

Code: Select all

804686983671116	    &#40;real value&#41;		  
804754690759638	    &#40;estimated value&#41;		  
    66565041041	    &#40;estimated standard deviation&#41;

This time the estimation error is about 1.0 standard deviations.

hgm · Post by **hgm** » Sat Jul 30, 2011 1:29 pm

Code: Select all

perft&#40; 1&#41;=    2.700000e+01 ( 0.000 sec&#41;       -nan
perft&#40; 2&#41;=    8.350000e+02 ( 0.000 sec&#41;   0.000000
perft&#40; 3&#41;=    2.407900e+04 ( 0.000 sec&#41;   0.000000
perft&#40; 4&#41;=    7.508620e+05 ( 0.040 sec&#41;   0.000000
perft&#40; 5&#41;=    2.257504e+07 ( 1.370 sec&#41;   0.000000
perft&#40; 6&#41;=    7.223324e+08 &#40;42.460 sec&#41;   0.000000
perft&#40; 7&#41;= ca.2.250791e+10 &#40;108.020 sec&#41;  31.001327
perft&#40; 8&#41;= ca.7.408028e+11 &#40;169.300 sec&#41;  31.003174
perft&#40; 9&#41;= ca.2.381747e+13 &#40;234.710 sec&#41;  31.000982 (+0.014%)
perft&#40;10&#41;= ca.8.047057e+14 &#40;300.980 sec&#41;  30.994018 (+0.0023%)
perft&#40;11&#41;= ca.2.658671e+16 &#40;372.060 sec&#41;  31.008612
perft&#40;12&#41;= ca.9.188215e+17 &#40;445.970 sec&#41;  31.007642
perft&#40;13&#41;= ca.3.109006e+19 &#40;526.130 sec&#41;  31.015900

This corresponds to sampling about 722M/32 moves at the 6-ply level, which would reduce the SD by a factor 4700 compared to a single sample. So if I assume the SD of a single sample is about 20% this would contribute 0.0042% to the SD. But I would have a similar error (perhaps slightly decreasing) for every other ply from 6 to 9, so that would predict a SD of 0.008%.

The perft(10) thus seems a bit lucky here; repeating it with another seed gives:

Code: Select all

perft&#40;10&#41;= ca.8.045037e+14 &#40;302.570 sec&#41;  30.997930 (-0.022%)
perft&#40;10&#41;= ca.8.047191e+14 &#40;300.420 sec&#41;  31.007912 (+0.0041%)
perft&#40;10&#41;= ca.8.046820e+14 &#40;300.580 sec&#41;  31.011086 (-0.0005%)
perft&#40;10&#41;= ca.8.048175e+14 &#40;303.520 sec&#41;  31.021387 (+0.016%)
perft&#40;10&#41;= ca.8.046987e+14 &#40;301.160 sec&#41;  31.009391 (+0.0016%)

Michel · Post by **Michel** » Sat Jul 30, 2011 2:59 pm

This corresponds to sampling about 722M/32 moves at the 6-ply level,

I don't quite understand. Do you also have an acceptance probability
of 1/32 at higher plies?

I am trying to figure out how many leaf nodes you visit.

hgm · Post by **hgm** » Sat Jul 30, 2011 3:29 pm

Indeed, I accept each move with a 1/32 probability on ply 6 to N-1. (For the last ply this would make no sense, and I count them all, as counting itself is faster than deciding with a certain probability if you should count.) I then divide the result by the fraction of moves actually accepted at each ply, which should be very close to 32, but has some variance that I can eliminate that way from the result.

So the number of end leaves visited for perft (10) is perft(10)/32^4, as I prune on levels 6, 7, 8 and 9. Based on the execution times I had the impression that the EBF happens to be about 32,because every next ply adds approximately equal time (and I was too lazy to really calculate it by doing the divisions), so the number of moves should stay approximately the same on every level.

Basicaly what I did is take qperft, and replacethe recursive call

Code: Select all

if&#40;d == 1&#41; count++; else perft&#40;d-1&#41;;

by

Code: Select all

if&#40;d==1&#41; count++; else
if&#40; level<THRESHOLD || (&#40;seed *= CONST&#41; & 0x1F000000&#41;==0&#41; &#123; /* 1/32 acceptance */
  accepted&#91;d&#93;++;
  perft&#40;d-1&#41;;
&#125; else rejected&#91;d&#93;++;

and afterwards, while printing, use the (1+rejected[d]/accepted[d]) factors to correctthe count.

Michel · Post by **Michel** » Sat Jul 30, 2011 3:44 pm

Ok thanks. I was trying to compare your result to Peter's.

Perft(10)/(32^4)=767M nodes.

I assume Peter's "random walks" correspond to parents of leaf nodes.
That would be 36M nodes, for about the same variance.

So if this is correct then it seems the variance reduction techniques _do_ work.
But they do give overhead, so you are still _much_ faster (it does not help
of course that Peter uses Java).

hgm · Post by **hgm** » Sat Jul 30, 2011 3:57 pm

I think the variance reduction techniques do indeed help. But I do think that my method basically does the same in amuch more simplistic way. Because I decide the acceptance of each move in the tree independently, I automatically sample many more paths in big sub-trees than in small sub-trees. Every path has exactly the same acceptance probability for me, so the number of sampled paths in a sub-tree will be strictly proportional to the size of that sub-tree. That will be true at any level. But at levels where the sampling would saturate, I enumerate rather than sample, to make sure they don't contribute to the variance at all.

As for raw speed, I had of course the advantage that I started from a super-fast perft already. Although keeping trackof the rejected/accepted counts did slow it down by some 25%.

Michel · Post by **Michel** » Sat Jul 30, 2011 4:41 pm

Every path has exactly the same acceptance probability for me, so the number of sampled paths in a sub-tree will be strictly proportional to the size of that sub-tree.

This a very nice observation. But am I right in saying that the
acceptance probabilities are not independent? Suppose we did the
selection at only a single ply. Then all children of a node at that ply
are either in or out. This positive correlation between certain leaf
nodes would increase the variance compared to pure Bernouilli
trials.

Anyway assuming that the leaf nodes are pure Bernouilli trials
represent a lower bound on the standard deviation that can be achieved.
In the case of Perft(10) this would be

32^4*sqrt(8.047e14*(1/32)^4)=2.905e10=0.0036%

(I know this computation is wrong since neighbouring leaf nodes are
definitely correlated so one should account for that).

Non uniform MC sampling can in principle bring the variance down
to zero but it requires infinite effort to get the perfect weights. So
perhaps it is an illusion that it can be better. This requires more thought.
I think this is really a beautiful problem.

Michel · Post by **Michel** » Sat Jul 30, 2011 5:43 pm

I did a quick variance computation for a uniform tree and it seems
that in that case Peter's method with uniform sampling has a higher efficiency
(defined as effort x variance) than HGM's method unless the acceptance probability is 1....

Seeing what goes on for a non-uniform tree is harder of course.

Perft(13) betting pool

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners

Re: A test point for the Monte Carlo practitioners