## Perft(13) betting pool

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Michel
Posts: 1966
Joined: Sun Sep 28, 2008 11:50 pm

### Re: Defect of a node?

What is wrong with calculating the weights based back propagated mean and variance instead of back propagating the weights directly (unless I misunderstood you) ?
Nothing. Back propagating the weights seemed more elegant. But apparently hard to do for a directed graph which is not a tree.

hgm
Posts: 22588
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Defect of a node?

Michel wrote:This was based on a pure implementation of your method though. If I understood correctly you keep track in your computation of the number of accepted moves at each ply. I don't know how this affects the result.
Well, that makes a large difference, because it effectively changes the process into deciding which of the moves you pick, rather than whether you pick a move. So the variance of the single probe will become the intrinsic variance of the sub-tree sizes. While in the unnormalized case it would be the variance of the sum of N draws with 1/N acceptance probability, which is N times the variance of a single draw with distribution

P(0) = 1 - 1/N
P(i) = 1/N * P(x=i)

The latter is 1/N SUM_i { P(x=i)*i^2 } - (1/N SUM_i { P(x=i)*i } )^2
= 1/N ( SUM_i { P(x=i)*i^2 } - (SUM_i { P(x=i)*i } )^2} + (1 - 1/N) * (SUM_i { P(x=i)*i } )^2 )
= 1/N var(x) + (1/N - 1/N^2) E^2(x).

So we get var(x) + (1 - 1/N)*E^2(x), which for large N means the variance get dominated by E(x). So if the frontier nodes have 20-30 moves, i.e. SD = 5, E(x) = 25, the unnormalized result has a 5 times larger SD than the normalized result, for the ply leading to them.

Michel
Posts: 1966
Joined: Sun Sep 28, 2008 11:50 pm

### Re: Defect of a node?

Yes I had noticed that for the pure method the result you get is badly influenced by E(X).

So what do you do precisely?

Instead of multiplying the end count by 32 for each selective ply you multiply by (total moves/accepted moves) for every selective ply?

hgm
Posts: 22588
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller
Contact:

### Re: Defect of a node?

Exactly.

Daniel Shawul
Posts: 3510
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: Defect of a node?

I think it is easier to resolve this issue by running tests. Can you run a test one with your default acceptance probability of 1/32 ,and another with Peter's modification of selecting exactly one random move and multiply by the number of legal moves? The test should use a fixed time control selected in such a way that it is equivalent to the time it takes you to run 100 mil simulations. One test starting from depth=0 and another one from depth=5. Is that alright ?

Michel
Posts: 1966
Joined: Sun Sep 28, 2008 11:50 pm

### Re: Defect of a node?

Ok I see. Yes this makes a big difference.

It seems a little bit tricky to analyze this recursively when there are several selective plies in a row (because of the non local character of e.g. "accepted moves").

Daniel Shawul
Posts: 3510
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: Defect of a node?

Michel wrote:
What is wrong with calculating the weights based back propagated mean and variance instead of back propagating the weights directly (unless I misunderstood you) ?
Nothing. Back propagating the weights seemed more elegant. But apparently hard to do for a directed graph which is not a tree.
Yes it is very difficult since it has a square root term in there. I am relying on the central limit theorem when I back propagate variances. May be there is some thing that can be improved there so I am re-reading cg2006 paper of Remi.

Michel
Posts: 1966
Joined: Sun Sep 28, 2008 11:50 pm

### Re: Defect of a node?

Yes it is very difficult since it has a square root term in there.
No this is not relevant. The back propagated weight of a node can be taken to be the sum of
the weights of the children. Then the leaf nodes are selected with the correct frequencies
(this is not quite the same as recalculating the weight from the back propagated
mean and variance, but it works).

The problem is that a node may have several parents and _all_ the parents should be updated. But you have arrived at this node from a particular parent. So you don't know what the other parents are.

Daniel Shawul
Posts: 3510
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: Defect of a node?

Michel wrote:
Yes it is very difficult since it has a square root term in there.
No this is not relevant. The back propagated weight of a node can be taken to be the sum of
the weights of the children. Then the leaf nodes are selected with the correct frequencies
(this is not quite the same as recalculating the weight from the back propagated
mean and variance, but it works).

The problem is that a node may have several parents and _all_ the parents should be updated. But you have arrived at this node from a particular parent. So you don't know what the other parents are.
Well that is the second problem you have for updating weights _directly_. If you start breaking down the formula like updating (mean^2 + variance) instead of its square root, why not update mean and variance directly and do what you want with it later? Well we agree you need a tree data structure since we have multiple moves at a node. Any other doesn't make sense.

Daniel Shawul
Posts: 3510
Joined: Tue Mar 14, 2006 10:34 am
Location: Ethiopia
Contact:

### Re: Defect of a node?

Wait a second, are you saying the weights are also summed too ? In that case you shouldn't have said I did it right when testing your formula... Because those weights do not have that behavior.