Perft(13) betting pool

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 07, 2011 8:41 pm

hgm wrote:
Daniel Shawul wrote:OTOH you prune many moves but still could have two or more moves to investigate at a ply. At least that is how I understood it. I don't pretend to know all about your method so do care to explain if i misunderstood. So that makes it heavy because it investigates more than one moves at a ply.
You overlook the fact that I can also have zero moves per node,and that that probability is larger than having two or more. On the average I search less than one node, because my acceptance probability is 1/32, and there are on average less than 32 moves per node. So I guess that in your terminology my method is actually lighter than all the others.

I don't know what you want me to say. It doesn't matter to me what N you currently use. You used 16 in the past... Ok so now you are saying your method could perform better than Peter's which investigates exactly 1 move (pure MC). Really not my problem! I have already told you many times it doesn't matter for my test which MC simulator I pick. Any comparisons should use the _same_ MC simulator (selective part in your case). Lighter or heavier doesn't matter. And also the same number of simulations for any full-width search depth you pick. Your tests always had more simulations for the bigger depths.

hgm · Post by **hgm** » Sun Aug 07, 2011 10:51 pm

Well, like I said, I completely lost track of what you are doing. I was just curious what the errors you quote mean.

petero2 · Post by **petero2** » Mon Aug 08, 2011 12:57 am

Daniel Shawul wrote:This has become the new number one ! It improved the leader of both the limited splitting version and the free splitting version by the tiniest of margins possible :) But it clearly makes the point that the accepted concept that "picking the node with biggest variance" is wrong. It should have been defined "pick the node that results in biggest reduction in variance". The smallest node could be picked depending on the number of visits.. The improvement is minute but it is still an improvement ..

Limited split version
Mean 6.284999E+16
StdDev 5.023038E+12
Error 4.908E+12
Free split version:
Mean 6.284786E+16
StdDev 4.227828E+12
Error 7.04E+12
For comparison previous best had 5.024237E+12 and 4.228939E+12 , so you can see the improvements are only in the third significant in both cases! What matters is that it is consistent.

With my method, http://talkchess.com/forum/viewtopic.ph ... 25&t=39678, I get a standard deviation of 3.9826e+12 after 100e6 random walks. If I split the data in 100 chunks with 1e6 random walks in each chunk and compute the standard deviation for each chunk, I get:

This shows that the weights are still improving after 100e6 random walks. If I could make the algorithm learn the optimum weights faster, my standard deviation would be even lower.

sje · Post by **sje** » Mon Aug 08, 2011 3:04 am

And the second perft(11) subtotal:
[D]rnbqkbnr/1ppppppp/p7/8/8/2N5/PPPPPPPP/R1BQKBNR w KQkq - 0 2
perft(11): 2,458,021,022,756,805

That's two down and 398 to go.

Transposition table stores after 119 hours:

0 draft 12
2 draft 11
100 draft 10
1847 draft 9
28391 draft 8

Predicted completion date: December 2011

Daniel Shawul · Post by **Daniel Shawul** » Mon Aug 08, 2011 3:00 pm

With my method, http://talkchess.com/forum/viewtopic.ph ... 25&t=39678, I get a standard deviation of 3.9826e+12 after 100e6 random walks. If I split the data in 100 chunks with 1e6 random walks in each chunk and compute the standard deviation for each chunk, I get:
This shows that the weights are still improving after 100e6 random walks. If I could make the algorithm learn the optimum weights faster, my standard deviation would be even lower.

First of all thanks for the data and plot. It helps to know I am on the right track. I will try to extract the standard deviation from the log file I have and make a plot. I suppose your method is the one which uses the hash tables for storing weights in the MC part of the tree alone. But I think also in your case you don't use full-width search depth (start from root), so you can also get some of the benefits.
Since the algorithms used are different I think that time comparison would be more appropriate. But I understand that is difficult to do right now and it is better to do the comparison with fixed simulations. My method doesn't actually use all of the 100 mil simulations. If you looked at the detailed data it only used 72mil simulations with about 30% wasted. I may be able to avoid that wastage because only reason I did it was to avoid bias of the mean in the previous method. So if that works,it should get the std down to the 3.9 range I guess. But haven't tested it with the new one yet. Also the splitting parameters are not tuned well and the current max depth reached by all methods is <=5. Anyway the method is not finished yet and clearly I have some problems to solve.
I suppose you have also issues to resolve too such as the slowdown due to HT. Have you compared it with the method which doesn't use HT in a fixed time ? Also after the disastrous result I start getting by using the same sequence of random numbers for calculating mean, I wonder if your method gets affected by it too?

Daniel Shawul · Post by **Daniel Shawul** » Mon Aug 08, 2011 3:08 pm

hgm wrote:Well, like I said, I completely lost track of what you are doing. I was just curious what the errors you quote mean.

I admit my posts are hard to read since I post results as I get them. 100mil simulations takes me hours here. Anyway I think I have mentioned somewhere that the "stddev" is the standard deviation of the sampling distribution (aka standard error). And "Error" is just the error from the true mean (6.28549e16). The latter doesn't mean much because sometimes I get surprisingly good estimates from methods with higher stddev. I have it there just to keep an eye on the effect on the mean estimate , which can get tricky sometimes. Mean and stddev are calculated by "moving" average formula for each leaf node and then backed by by summation for the leaf nodes.
I will try to summarize in a table all the results I have including the ones I abandoned now.

Daniel Shawul · Post by **Daniel Shawul** » Mon Aug 08, 2011 4:23 pm

I got the plot now. And it is surprisingly very smooth. I guess that in your case the HT replacement is the cause of the hectic nature of the plot.

Daniel Shawul · Post by **Daniel Shawul** » Mon Aug 08, 2011 6:51 pm

Here is the all in one plot of the five free splitting versions I tested. All of them follow a 1/sqrt(N) trend but you can see that the uniform splitter stands out of the rest. And it has a "bump" somewhere around 4 million iterations for some reason. The rest ,except the log-tree size formula, pretty much overlap. I guess log-tree size is not appropriate since it has larger stddev in the earlier iterations but it was giving surprisingly low error from the actual mean for both the free splitting and limited splitting versions.

petero2 · Post by **petero2** » Mon Aug 08, 2011 7:49 pm

Daniel Shawul wrote:
With my method, http://talkchess.com/forum/viewtopic.ph ... 25&t=39678, I get a standard deviation of 3.9826e+12 after 100e6 random walks. If I split the data in 100 chunks with 1e6 random walks in each chunk and compute the standard deviation for each chunk, I get:
This shows that the weights are still improving after 100e6 random walks. If I could make the algorithm learn the optimum weights faster, my standard deviation would be even lower.
First of all thanks for the data and plot. It helps to know I am on the right track. I will try to extract the standard deviation from the log file I have and make a plot. I suppose your method is the one which uses the hash tables for storing weights in the MC part of the tree alone. But I think also in your case you don't use full-width search depth (start from root), so you can also get some of the benefits.
Since the algorithms used are different I think that time comparison would be more appropriate. But I understand that is difficult to do right now and it is better to do the comparison with fixed simulations. My method doesn't actually use all of the 100 mil simulations. If you looked at the detailed data it only used 72mil simulations with about 30% wasted. I may be able to avoid that wastage because only reason I did it was to avoid bias of the mean in the previous method. So if that works,it should get the std down to the 3.9 range I guess. But haven't tested it with the new one yet. Also the splitting parameters are not tuned well and the current max depth reached by all methods is <=5. Anyway the method is not finished yet and clearly I have some problems to solve.
I suppose you have also issues to resolve too such as the slowdown due to HT. Have you compared it with the method which doesn't use HT in a fixed time ? Also after the disastrous result I start getting by using the same sequence of random numbers for calculating mean, I wonder if your method gets affected by it too?

Actually I gave up on the hash table and now instead grow a tree from the root position. When I reach a leaf position, I first check if this is a good split candidate, and if it is, I split. When I reach a leaf node that I don't want to split, I do a random walk from there. Each tree node contains information which makes it possible to estimate "optimal weights", as described here http://talkchess.com/forum/viewtopic.ph ... 71&t=39678. When there is not enough information in the tree, I just use uniform sampling. This method has the advantage that it is unbiased even if I use the same walks to compute the sampling probabilities and to collect perft estimates.

Earlier I tried something which I think is similar to what you do, see http://talkchess.com/forum/viewtopic.ph ... 77&t=39678, but I also had the problem that my estimates were always far too low. I see you were able to overcome that problem by "wasting" 30% of the simulations. Would be interesting to see if that can be improved somehow.

How large do you allow your tree to grow? I assume your method gets better with a larger tree. I used 4 million tree nodes in my test.

I let my simulation run longer (600M walks) and the standard deviation over time looks like this:

It looks like it stabilizes after 200M walks, with a standard deviation at around 3.5e13 for each 1M walks. It would of course be nice if I could somehow learn those weights much faster, but maybe it is not possible. After 100M walks, there has only been on average 25 visits for each leaf node in the tree.

Regarding time based comparisons, I will optimize my java implementation so that it at least doesn't compute zobrist hash keys and updates incremental evaluation terms. (I currently use unmodified code from my java chess program.)

Daniel Shawul · Post by **Daniel Shawul** » Mon Aug 08, 2011 10:39 pm

Actually I gave up on the hash table and now instead grow a tree from the root position. When I reach a leaf position, I first check if this is a good split candidate, and if it is, I split. When I reach a leaf node that I don't want to split, I do a random walk from there. Each tree node contains information which makes it possible to estimate "optimal weights", as described here http://talkchess.com/forum/viewtopic.ph ... 71&t=39678. When there is not enough information in the tree, I just use uniform sampling. This method has the advantage that it is unbiased even if I use the same walks to compute the sampling probabilities and to collect perft estimates.

Isn't what you describe exactly UCT ? You first grow a tree from the root. Then you do a certain number of simulations say 10 for each node (uniform sampling that is) until you collect enough information. I do that too. Then you split based on the information you have which is the key component of UCT again . I can only understand that you probably have a new formula to select which node to split. UCT has UCB by default but I have already tested a dozen of formulas.
You never splitted in your hash table implementations but tried to optimize weights from the data you already calculated and stored. What exactly do you do differently in the upper part of the tree ?

Earlier I tried something which I think is similar to what you do, see http://talkchess.com/forum/viewtopic.ph ... 77&t=39678, but I also had the problem that my estimates were always far too low. I see you were able to overcome that problem by "wasting" 30% of the simulations. Would be interesting to see if that can be improved somehow.

Yes this I remember. And you said you are going back to your dynamic weight method (the HT method) unless someone comes up with the correct back up method.
And my solution for it is not the wasted nodes but to use a different sequence of random numbers than we used to calculate. I don't think "wasting" any simulations is necessary now. Earlier it was biasing results but after my improvement of using two parallel perfts, I don't need it anymore. Like I said I will be testing it next.
Here is what I described as a UCT-like perft

Here the idea for this kickass algorithm. You can say it is UCT for perft but I haven't actually implemented it so it can be all for nothing if I made some mistake in planning somewhere
Code: Select all
a&#41; Start from the root and do 10 Monte carlo simulations for each move
b&#41; Pick the best node to expand using UCB formula I provided above
c&#41; Expand the selected node which will be one of the biggest nodes with the right formula.
d&#41; Do this recursively for the children of that big node
So this algorithm explores the bigger nodes more often, while the smaller nodes will have their estimates from small number of simulations. No need for heuristic, no need for fixed depth expansion unlike the one we are doing i.e ply 3-6 fixed with uniform sampling.

If I have to write out my imagination, this algorithm will be so good as to be the best. But I could be wrong somewhere. Opinions ?

How large do you allow your tree to grow? I assume your method gets better with a larger tree. I used 4 million tree nodes in my test.

I let my simulation run longer (600M walks) and the standard deviation over time looks like this:

It can grow to any depth. If I reduce the number of simulations before splitting say from 1000 to 100 it will be selective and go deeper for some of the moves.

Perft(13) betting pool

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Yet another piece of perft(13)

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?