Perft(13) betting pool

Michel · Post by **Michel** » Wed Aug 17, 2011 5:18 pm

I made a new version which updates the threshold dynamically using a moving average. It
did not perform any better than the one with the static threshold.

Probably a better approach is to do something like Daniel. Keep an array of leaf nodes of
the full width tree to preserve node specific data between the cycles. You can use this
data to guide a new cycle.

In a plain Peter1 cycle you do only one random walk in every leaf node. In the modified
version you would do one or more depending on the saved data. Of course you should
not do too many. Otherwise you might just as well search one full ply deeper.

Daniel Shawul · Post by **Daniel Shawul** » Wed Aug 17, 2011 5:43 pm

I think that it is good to store the most important nodes as much as you can, that will be ply <= 5 for the opening position. Depth=6 requires too much memory. You can't do better than a method which have information to adjust its visits from cycle to cycle. But for experimenting,it limits the amount of full width and selective search you can do from the leaf onwards. So I am setting the tree-growth off (or grow only ply<=2 tree). It seems I can gain improvement by doing a "tapered" monte-carlo, I tried 3moves-2moves-1moves and it still improves the result. I can not have that selective part attached at the end of a 6-ply full width, because it goes deep into a 12-ply perft. So it might not improve it for perft(12) at that depth but in principle if you are computing a bigger perft stretching/contracting the different zones can help.

Right now I have no conditions when to test more moves except for depth criterion. The depth criterion is good since reduction of variance at deeper plies brings more improvement than shallower plies. A "conditional" type of MC experiment that you are doing can bring additional improvements.

sje · Post by **sje** » Wed Aug 17, 2011 6:37 pm

Two more draft 11 results:

Code: Select all

rnbqkbnr/1ppppppp/8/p7/8/2N5/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 3411299116725061
rnbqkbnr/1ppppppp/8/p7/8/P7/1PPPPPPP/RNBQKBNR w KQkq - 0 2 11 2171307455092881

The first of these is the largest subtotal seen so far.

Eleven down, 389 to go.

Daniel Shawul · Post by **Daniel Shawul** » Wed Aug 17, 2011 10:56 pm

There is some additional variability when adding a second layer and I am finding it hard to improve even on a 3-ply full-width. If I do two layer 2-move Monte carlos on top , it means I am investing 4x more than a simple monte carlo in an iteration, so if those plies where we investigate 2 moves don't have big variance in them, it never catches up with the default MC. I waited till 1e9 move gen calls and it didn't compensate enough to catch up...

Michel · Post by **Michel** » Thu Aug 18, 2011 11:56 am

I did a few theoretical computations.

Whether or not it is beneficial to do multiple MC searches in the tips of the full width tree seems like a tricky optimization problem. Even if you have perfect information.

Of course if you increase the average number of searches in the tips (in a single cycle)
the sd*sqrt(moveGen) metric improves (if you proportion in an optimal way).
But then you will get even more improvement by simply doing an extra full width ply (if the
tree is not artificially LMR'ed).

For the moment Peter1 remains an extremely good compromise between results and simplicity of implementation.

Daniel Shawul · Post by **Daniel Shawul** » Thu Aug 18, 2011 5:51 pm

I am not sure if there is something wrong with doing 2 consecutive 2-move monte carlos instead of one 4-move monte carlo. I did a test on a 5-ply search with 4-moves at one ply and it improved the result. I run it again with a different seed and the improvement is still there.

Code: Select all

5-ply&#58; 6.285151e+016 +- 3.597655e+012  6.148286e+016 +- 2.051403e+013 500959908

Compared to this previous best result

Code: Select all

5-ply&#58; 6.284583e+016 +- 3.867508e+012  6.107216e+016 +- 2.145204e+013 501118477  cost 8.63e16

Edit: I did two 2-move MCs at consecutive plies and it bettered the old result but is poorer than the 4-moves at a ply result

Code: Select all

5-ply&#58; 6.286282e+016 +- 3.795925e+012  6.168563e+016 +- 2.104039e+013 502235974

Ajedrecista · Post by **Ajedrecista** » Thu Aug 18, 2011 8:14 pm

Hi everybody:

Just for curiosity I tried the method of Adam Hair of adjusting odd BF's with polynomials:

http://talkchess.com/forum/viewtopic.ph ... 16&t=39678

In my case I adjusted with quartic functions (A·x^4 + B·x³ + C·x² + D·x + E) because I splitted in twenty different polynomials, one for each initial move. Firstly I did the calculations with Excel and the sum was around 1.98316e+18 (a little high from my point of view).

I suspected than roundings of Excel 'hurted' a little the estimations, so I adjusted again the twenty polynomials with the help of Derive 6, for getting more digits. I had reason this time, but the final sum is still 'high' in comparison with other computational methods that I have read here. Here are the numbers I got (hoping no errors when doing all the math):

Code: Select all

Move            Estimate

a3	    54,531,136,234,162,156
a4	    79,063,508,502,158,903
b3	    72,639,688,816,621,455
b4	    73,709,067,003,701,511
c3	    86,056,827,153,858,571
c4	    97,970,589,000,615,670
d3	   141,383,458,087,446,826
d4	   213,018,757,033,778,459
e3	   240,725,195,611,700,379
e4	   246,589,794,037,145,382
f3	    43,493,804,033,203,347
f4	    61,162,666,611,330,980
g3	    76,232,972,134,254,158
g4	    65,814,044,230,043,052
h3	    54,049,205,191,901,014
h4	    80,829,651,636,234,054
Na3	   63,795,543,395,998,863
Nc3	   84,531,400,071,549,975
Nf3	   82,344,973,694,923,533
Nh3	   64,898,200,828,736,569
-------------------------------
SUM = 1,982,840,483,309,364,857

Although the number is a little high IMO, I also think that the relative percentages of this estimate (for each move) are fairly good in view of the trend that each move follows when n grows (I calculated it in two Excel files on July). I wonder what would be the final sum when splitting in the initial moves, using for example Peter1 method with fw = 4 (which seems a star method), and compare the final sum and the counts of each move (also the relative percentages). I encourage everybody to run this, if possible (I do not know if Peter1 method allows splitting in initial moves; I hope yes because all the methods here seem very complete). Good luck.

Regards from Spain.

Ajedrecista.

Daniel Shawul · Post by **Daniel Shawul** » Fri Aug 19, 2011 4:25 pm

Trying 4-moves at a ply gave an improvement. Apparently the result will improve with more number of moves as it becomes closer
to a full width search. It is weird that 2 consecuative 2-move layers does not give an improvement ...

Code: Select all

3-ply&#58; 6.284576e+016 +- 7.247221e+012  6.289475e+016 +- 3.881020e+013 500508916  1.621e17
4-ply&#58; 6.284564e+016 +- 5.381713e+012  6.269848e+016 +- 3.001727e+013 501189049  1.205e17
5-ply&#58; 6.285151e+016 +- 3.597655e+012  6.148286e+016 +- 2.051403e+013 500959908  8.052e16
5-ply&#58; 6.284978e+016 +- 1.971773e+012  6.241545e+016 +- 1.134331e+013 2001119141 8.821e16

Then I switched attention to scoring and sorting moves at the selective ply, so that I will use
the U and 1 - U trick. The scoring is done with piece square tables. The result was a disappointment with a slight improvement. I wouldn't even call it an improvement since the movegen metric is not appropriate for this case since I do more work per movegen call.

Code: Select all

3-ply&#58; 6.285025e+016 +- 7.194573e+012  6.285244e+016 +- 3.853078e+013 500477709  1.610e+017
4-ply&#58; 6.285186e+016 +- 5.332993e+012  6.274193e+016 +- 2.981010e+013 501225494  1.194e+017
5-ply&#58; 6.285195e+016 +- 3.556072e+012  6.146853e+016 +- 2.033971e+013 500920266  7.959e+016
5-ply&#58; 6.285332e+016 +- 1.952886e+012  6.241014e+016 +- 1.126306e+013 2001064140 8.736e+016

I think this idea is tired now so we need fresh ideas. Maybe I will try the static scoring for
the whole MC section if it helps. Also I tested different variance formulas for the upper section
and they all scored worse than the mean. I suspect the reason is because I do the second sequence less
frequently and the variances are not that much correlated.

Edit:

Idea for the full width searchers : Instead of expanding to a fixed depth, why not extend/reduce part of the tree. For example by looking at the perft values at the root e2e4,e2e3 and d2d3 could be extended by one ply. It may work really well for LMR trees f.i.

sje · Post by **sje** » Fri Aug 19, 2011 9:37 pm

After nearly 17 days, the perft(13) run has produced fifteen draft 11 records. I present them here so that estimators may tune their algorithms.

Code: Select all

rnbqkbnr/1ppppppp/p7/8/8/N7/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 1865423942798492
rnbqkbnr/1ppppppp/p7/8/8/2N5/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 2458021022756805
rnbqkbnr/1ppppppp/p7/8/8/1P6/P1PPPPPP/RNBQKBNR w KQkq - 0 2 11 2110456650953864
rnbqkbnr/1ppppppp/p7/8/8/P7/1PPPPPPP/RNBQKBNR w KQkq - 0 2 11 1562327606120748
rnbqkbnr/1ppppppp/p7/8/8/7N/PPPPPPPP/RNBQKB1R w KQkq - 0 2 11 1882390155585040
rnbqkbnr/1ppppppp/p7/8/8/5N2/PPPPPPPP/RNBQKB1R w KQkq - 0 2 11 2373558610955814
rnbqkbnr/1ppppppp/p7/8/P7/8/1PPPPPPP/RNBQKBNR w KQkq - 0 2 11 2250225134770081
rnbqkbnr/1ppppppp/p7/8/1P6/8/P1PPPPPP/RNBQKBNR w KQkq - 0 2 11 2178802463490013
rnbqkbnr/1ppppppp/8/p7/8/N7/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 2670298917876179
rnbqkbnr/1ppppppp/8/p7/P7/8/1PPPPPPP/RNBQKBNR w KQkq - 0 2 11 2373873098604319
rnbqkbnr/1ppppppp/8/p7/8/2N5/PPPPPPPP/R1BQKBNR w KQkq - 0 2 11 3411299116725061
rnbqkbnr/1ppppppp/8/p7/8/P7/1PPPPPPP/RNBQKBNR w KQkq - 0 2 11 2171307455092881
rnbqkbnr/1ppppppp/8/p7/8/1P6/P1PPPPPP/RNBQKBNR w KQkq - 0 2 11 3039243237480582
rnbqkbnr/1ppppppp/8/p7/8/7N/PPPPPPPP/RNBQKB1R w KQkq - 0 2 11 2654960388372645
rnbqkbnr/1ppppppp/8/p7/8/5N2/PPPPPPPP/RNBQKB1R w KQkq - 0 2 11 3354897405349959

Fifteen down, 385 to go. I expect the first draft 12 records in mid October 2011 and the entire run should complete in January 2012.

Daniel Shawul · Post by **Daniel Shawul** » Sat Aug 20, 2011 4:45 pm

Daniel Shawul wrote:I think what is happening is the following. We both do basically the same thing but Peter collects perft estimates ONLY at the root. So he reuses them there but elsewhere only the weights are adjusted.. He does use the same sequence but as I have already mentioned, if you do it only at the root it is unbiased, at least practically. So I think that summing inside the tree is the problem. This maybe the culprit because for a ply-3 tree the second sequence had a less pronounced bias than I ever saw with deeper searches. And of course for ply-0 second sequence is as good as the first.. What do you think ?

I was right about this observation. The use of a tree has a huge impact on the rate of convergence. If you do it only at the root it has a faster rate of convergence but with deeper depths it converges so slow it looks practically biased. It takes too much effort to converge but it will eventually. Also the difference between the actual error and standard error is very big for the second sequence. I did the following experiment.
An artificial tree with a branching factor of 4 is constructed and at each leaf a random number between 0 and 100 inclusive is returned. And I use all the back-propagation methods to find the mean at the root.Expected value for each leaf node is 50. SD for uniform distribution is 50/sqrt(12) = 28.87. Then I did tests with different depth. The biggest depth I did was 9 with 4^9 = 262144 leaf nodes. E(X) = 262144*50 = 13107200. The plot is shown below. You can see if I stopped after 500mil iterations , the margin of error for the second sequence is very very high but it constantly increases and becomes more and more closer to the mean. I run it the whole night and still had an error of -130, while the first sequence had only 10. After 500 mil iterations the numbers were -220 and -2390, so roughly it has a 11X more error at any instant.

I can do similar plots for depths 1,2,3--8 which shows that relatively less amount of effort is required. Btw I wrote a brand new code for this purpose to avoid existing bugs if any but I am now convinced it is the tree which makes it harder and it is not biased even if you _reuse_ the data to guide search as long as you do NOT add it to the perft computation. Peter does additions at the root making life easier for him, but I still need to use the second sequence even though the method is theoretically unbiased. The only difference is the degree to which the convergence is affected and that depends on the depth of the tree. I haven't yet expermented with dynamically growing tree.

Perft(13) betting pool

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Two more draft 11

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Perft(13) betting pool

Re: Summary of Monte Carlo results for perft(12) so far

After nearly 17 days

Re: Summary of Monte Carlo results for perft(12) so far