Perft(13) betting pool

Michel · Post by **Michel** » Sat Aug 13, 2011 9:16 pm

Now is the time for you to speak Michel.

I don't see why. Proving that your estimator is unbiased is _your_ responsibility. And I don't call your comments to Peter a description of an algorithm. Sorry.

Daniel Shawul · Post by **Daniel Shawul** » Sat Aug 13, 2011 10:32 pm

Michel wrote:
Now is the time for you to speak Michel.
I don't see why. Proving that your estimator is unbiased is _your_ responsibility. And I don't call your comments to Peter a description of an algorithm. Sorry.

Well you just don't get it do you ? If it weren't for me you guys would have been doing static heuristics right now. Fact! Not arrogance ,simple statement of facts I have demonstrated for the nth time by the two posts I made! You took tree splitting and dynamic weight and what have you, after bashing me for long for saying static heuristics is a stupid idea. Implementations are secondary you know.

About the bug it is fixed so sanity is restored. It was just counting problem that got 10x higher.

Michel · Post by **Michel** » Sat Aug 13, 2011 10:55 pm

I didn't want to be rude. But if you ask advise from people
you should give them a clear description of your algorithm. I know exactly what HGM and Peter do, but I have no idea what you do.

Daniel Shawul · Post by **Daniel Shawul** » Sat Aug 13, 2011 11:01 pm

I thought we were discussing a result I produced like we usually do. I don't need advice, well it is good to have one, but if you think like that no thank you. I just posted a weird result I have .. Well when I just did that yesterday , what did we get ? You run the test and confirmed the weird result, didn't you ? Well I will be cautious what I say/post because it seems, it will be taken against me like in the court of law. That is why you started suddenly saying my method is biased even after the gazillion sane results I posted for limite, free splitting versions... I still think caution is required but since you made up your mind about what is right / wrong, why bother.

hgm · Post by **hgm** » Sat Aug 13, 2011 11:07 pm

The problem I have with adaptive methods is that there is no information available to adapt on, unless you allow on average more than one move per node to be accepted. And this would explode a deep perft tree very rapidly, using almostt all nodes in the final plies, drawing away the nodes from the early plies, which seems to defeat the purpose.

So the only place where I can see an application is when at the first non-full-width level you do have more than one move per node, but not enough to completely perft out all moves. Then you could use the first move using Peter1 to make a sub-tree size estimate, and if the estimate is significantly above average, do some extra sampling in that move. This assumes the average siges are approximately known, but they could be known to within, say, 10 percent, which is good enough for this application, from very course initial sampling to determina all perft(N) upto the desired depth. If the desire is to do 3 samples per full-width leaf, and make the number of samples in each node proportional to the sub-tree size, you could refrain from further sampling if the first path predicts a sub-tree value of 0-40% average size. And otherwise take a second sample, and leave it at that if the average of the two predictions is below 80%. And then take one more to see if you get above 120%, etc.

That would reduce the sampling noise of that particular level. Which would be fine if you calculated the subsequent levels exactly. But it would not do anything for the sampling of those subsequent levels.

Michel · Post by **Michel** » Sat Aug 13, 2011 11:26 pm

I don't need advice,

Well you asked repeatedly why your method is biasedand if your back propagation method for the variance is correct.

That is why you started suddenly saying my method is biased even after the gazillion sane results I posted for limite, free splitting versions..

When yesterday you started posting spectacular results and ignored my repeated comments about the impossibly high z values I got upset. That's all.

Daniel Shawul · Post by **Daniel Shawul** » Sat Aug 13, 2011 11:51 pm

Again you misunderstand. My method will be biased if I don't use a second sequence of random numbers but not otherwise. That is a fact. And Peter has also confirmed that he did observe the _same thing_ and gave up because of that. OTOH I came up with two solutions that I already explained here. Now a Peter2 is proposed but it looks exactly like UCT. Where did that previous problem go away ? Peter himself observed it. Don't you find that at least worthy of an explanation ? So where did it go. I clearly explained my self that I was forced to use second sequence of random numbers. If you don't have an explanation that is fine but don't blame me for asking the question. That is not seeking for advice rather a legitimate inquiry to inner-workings of the method which is all I am interested in anyway.

petero2 · Post by **petero2** » Sat Aug 13, 2011 11:56 pm

hgm wrote:The problem I have with adaptive methods is that there is no information available to adapt on, unless you allow on average more than one move per node to be accepted. And this would explode a deep perft tree very rapidly, using almostt all nodes in the final plies, drawing away the nodes from the early plies, which seems to defeat the purpose.

So the only place where I can see an application is when at the first non-full-width level you do have more than one move per node, but not enough to completely perft out all moves. Then you could use the first move using Peter1 to make a sub-tree size estimate, and if the estimate is significantly above average, do some extra sampling in that move. This assumes the average siges are approximately known, but they could be known to within, say, 10 percent, which is good enough for this application, from very course initial sampling to determina all perft(N) upto the desired depth. If the desire is to do 3 samples per full-width leaf, and make the number of samples in each node proportional to the sub-tree size, you could refrain from further sampling if the first path predicts a sub-tree value of 0-40% average size. And otherwise take a second sample, and leave it at that if the average of the two predictions is below 80%. And then take one more to see if you get above 120%, etc.

That would reduce the sampling noise of that particular level. Which would be fine if you calculated the subsequent levels exactly. But it would not do anything for the sampling of those subsequent levels.

It may be possible to make something like this work, but you have to be careful not to introduce bias. As a very simplified example, suppose you try to estimate the mean of a random variable that is normally distributed with mean 0 and standard deviation 1. If your algorithm takes a number of samples, but for each positive sample, it decides to get "a second opinion" and use the average of the original value and the "second opinion", then your estimator would be a random variable Y, which can be written:

Code: Select all

Y = X1 + &#40;X1>0&#41;*&#40;X2-X1&#41;/2

Here (X1>0) should be regarded as 1 if the condition is true and 0 otherwise. If we calculate the mean of Y, using linearity of the expectation operator, the fact that X1 and X2 are independent with E(X1)=E(X2)=0, we get:

Code: Select all

E&#40;Y&#41; = E&#40;X1&#41; + 1/2*E&#40;&#40;X1>0&#41;*X2&#41;  - 1/2*E&#40;&#40;X1>0&#41;*X1&#41; =
     = E&#40;X1&#41; + 1/2*E&#40;X1>0&#41;*E&#40;X2&#41; - 1/2*E&#40;&#40;X1>0&#41;*X1&#41; =
     = 0     + 1/2*E&#40;X1>0&#41;*0     - 1/2*E&#40;&#40;X1>0&#41;*X1&#41; =
     = -1/2*E&#40;&#40;X1>0&#41;*X1&#41; = 
     = -1/2*integrate&#40;x/sqrt&#40;2*pi&#41;*exp&#40;-x^2/2&#41;,x,0,inf&#41; = -1/sqrt&#40;8*pi&#41;

So the estimator is biased. I think the same thing would happen with the method you described.

Michel · Post by **Michel** » Sat Aug 13, 2011 11:58 pm

I think I understand why the full width searches are so effective.
I thought of a full width search as a uniform search but that's of course
not true. It spends most of its time in the large subtrees. Which is
precisely what you want for a MC perft search.

Michel · Post by **Michel** » Sun Aug 14, 2011 12:05 am

Where did that previous problem go away ? Peter himself observed it. Don't you find that at least worthy of an explanation ?

But it has been explained many many many many times. Most recently Peter wrote

No, because the Peter2 method does a series of random walks from the root. The first part (up to the leaf node) is weighted, the second part is unweighted. But both parts have weight * probability = 1, so a single random walk will generate an unbiased estimate. The final result is just the average value of all random walk values, which is of course unbiased when all individual values are unbiased.

The fact that the weights change dynamically is not a problem, because as long as weight * probability = 1, the result is still unbiased.

What can possibly be unclear about this?

Perft(13) betting pool

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far

Re: Summary of Monte Carlo results for perft(12) so far