Perft(13) betting pool

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 07, 2011 1:20 pm

hgm wrote:I have completely lost track of what you are doing. Which is OK, because I am only interested in comparing the outcome tomy own results. But the standard deviations and errors you quote, how many games did you have to play to obtain those?

I am not sure what you want to compare. Well what I am doing is comparing the effect of splitting using different formulas available to me. AFAIK you expand to a fixed depth uniformly so that will be "uniform" in my tests. All tests are done with a fixed number of simulations so the actual number doesn't matter. Anyway I have mentioned it many times already 100 mil

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 07, 2011 1:35 pm

And here is the missing Log-tree size result with free splitting versions. I think it has come in fourth place (i.e after the new number one is added) .

Code: Select all

Mean	6.286348E+16							
			StdDev	4.966971E+12							
			Error	8.583E+12

And detailed result

Code: Select all

Log-tree size	
0	h2h3	1.81E+15	+-	6.93E+11	1.81E+15	2939466	33797		4.80E+23	
	0	h2h4	2.61E+15	+-	8.55E+11	2.62E+15	3660969	58215		7.30E+23	
	0	g2g3	2.49E+15	+-	8.18E+11	2.50E+15	3566734	54204		6.68E+23	
	0	g2g4	2.21E+15	+-	7.32E+11	2.22E+15	3335250	42521		5.36E+23	
	0	f2f3	1.55E+15	+-	6.03E+11	1.55E+15	2636787	20806		3.63E+23	
	0	f2f4	2.04E+15	+-	7.52E+11	2.05E+15	3179819	39369		5.66E+23	
	0	e2e3	7.10E+15	+-	2.17E+12	7.16E+15	5629451	141422		4.70E+24	
	0	e2e4	7.20E+15	+-	2.35E+12	7.27E+15	5657350	141739		5.52E+24	
	0	d2d3	4.57E+15	+-	1.17E+12	4.59E+15	4763097	84516		1.37E+24	
	0	d2d4	6.28E+15	+-	1.65E+12	6.33E+15	5388945	125126		2.73E+24	
	0	c2c3	2.74E+15	+-	8.93E+11	2.75E+15	3756436	67732		7.97E+23	
	0	c2c4	3.10E+15	+-	1.00E+12	3.12E+15	4001560	72858		1.00E+24	
	0	b2b3	2.40E+15	+-	8.08E+11	2.41E+15	3494193	51282		6.52E+23	
	0	b2b4	2.40E+15	+-	8.00E+11	2.41E+15	3499493	52710		6.39E+23	
	0	a2a3	1.82E+15	+-	6.99E+11	1.83E+15	2951167	34338		4.89E+23	
	0	a2a4	2.56E+15	+-	8.42E+11	2.57E+15	3623677	57003		7.10E+23	
	0	g1h3	2.12E+15	+-	7.44E+11	2.13E+15	3257143	42625		5.54E+23	
	0	g1f3	2.69E+15	+-	8.31E+11	2.70E+15	3723154	53329		6.91E+23	
	0	b1c3	2.72E+15	+-	9.35E+11	2.73E+15	3741753	59205		8.75E+23	
	0	b1a3	2.09E+15	+-	7.69E+11	2.10E+15	3226556	42175		5.91E+23	
											
			6.252344E+16			6.286348E+16	76033000	1274972		2.467080E+25	
											
			Mean	6.286348E+16							
			StdDev	4.966971E+12							
			Error	8.583E+12

sje · Post by **sje** » Sun Aug 07, 2011 3:49 pm

Yet another piece of perft(13), the first draft 11 result from the current run:
[D]rnbqkbnr/1ppppppp/p7/8/8/N7/PPPPPPPP/R1BQKBNR w KQkq - 0 2
perft(11): 1,865,423,942,798,492

One down, 399 to go.

hgm · Post by **hgm** » Sun Aug 07, 2011 4:08 pm

Daniel Shawul wrote:
hgm wrote:I have completely lost track of what you are doing. Which is OK, because I am only interested in comparing the outcome tomy own results. But the standard deviations and errors you quote, how many games did you have to play to obtain those?
I am not sure what you want to compare. Well what I am doing is comparing the effect of splitting using different formulas available to me. AFAIK you expand to a fixed depth uniformly so that will be "uniform" in my tests. All tests are done with a fixed number of simulations so the actual number doesn't matter. Anyway I have mentioned it many times already 100 mil

So this would be comparable (in effort) as when I start from a 6-ply full-with from the opening position, which is about 119M paths? The errors you quote do seem all larger, which made me wonder.

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 07, 2011 5:02 pm

hgm wrote:
Daniel Shawul wrote:
hgm wrote:I have completely lost track of what you are doing. Which is OK, because I am only interested in comparing the outcome tomy own results. But the standard deviations and errors you quote, how many games did you have to play to obtain those?
I am not sure what you want to compare. Well what I am doing is comparing the effect of splitting using different formulas available to me. AFAIK you expand to a fixed depth uniformly so that will be "uniform" in my tests. All tests are done with a fixed number of simulations so the actual number doesn't matter. Anyway I have mentioned it many times already 100 mil
So this would be comparable (in effort) as when I start from a 6-ply full-with from the opening position, which is about 119M paths? The errors you quote do seem all larger, which made me wonder.

No. Your MC part is heavy. Peter's (a.k.a pure monte-carlo) simulation considers only one move. So it is not comparable in effort. Also I am not sure how you do your tests ,but do you visit the perft(6) nodes only once to get your result? And the maximum depth of expansion for the uniform selector is 5 not 6 (i.e the maximum number of nodes I store in memory). See the column second from last which prints the number of nodes stored for each move.
Edit: I see that with the expansion method a certain number of simulations are required for a node before split . So the equivalent
will be to visit each perft(5) node 100,000,000 / perft(5) ~= 20.55 times. And that with a simple MC should be the equivalent.

Anyway you are missing the whole point. The MC part is a black box to me and I am only comparing different ways of splitting. Your version of MC does more work but doesn't mean it is better.
(a) Pure MC gave better results (lower variance) per _simulation time_ or so I read, and also is easier to implement.
(b) If indeed you have a better MC simulator I can use that and I should be able to get better results with one of the non-unirorm splitters because uniform method ,as you do it now, is consistently performing as the worst now.
(c) I have tested doing 2 monte carlo simulations instead of 1 for the leaf nodes and it gives lower variance as expected. Obviously the more work you do in the MC part the lower the variance.
Given all of the above, it is good to take the MC part out of the equation unless there is some interaction with upper part.

hgm · Post by **hgm** » Sun Aug 07, 2011 5:51 pm

Well, there is obviously more than one point I don't get. Like what you mean by "MC part" or "heavy". Also, with 1 move you cannot get perft(12) in any method.That needs at least 12 moves, and the std deviation would be huge (close to 100%, or even larger) in that case.

When I am talking about 119M paths I do indeed mean I visit every perft(6) leaf exactly once. But in fact the number of paths that make it all the way to 11 ply is only 62M (because the acceptance probability 1/32 is smaller than the inverse average branching factor). On the 12th ply I don't do any move at all, of course, but I count (and thus have to generate) moves in the 11-ply nodes. But I guess Peter's method has to do that too, to get the factor for that ply.

So it seems my effort is some where between 62M and 119M 12-ply games (the initial 5 plies being highly combined,and thus not really contributing to the effort). Why is that more effort than your generation of 100M games?

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 07, 2011 6:35 pm

hgm wrote:Well, there is obviously more than one point I don't get. Like what you mean by "MC part" or "heavy". Also, with 1 move you cannot get perft(12) in any method.That needs at least 12 moves, and the std deviation would be huge (close to 100%, or even larger) in that case.

The MC part is the "selective" part in your case. Most of us consider only one move so that makes it more like a conventional montecarlo. OTOH you prune many moves but still could have two or more moves to invesitgate at a ply. At least that is how I understood it. I don't pretend to know all about your method so do care to explain if i misunderstood. So that makes it heavy because it investigates more than one moves at a ply. And it is not surprising if it gives lower variance.The point is if it can perform better with a limited time because that is the only comparison that makes sense in this case. From what I gathered, Peter's improvement is that considering only one move gave a lower variance for a fixed time.

When I am talking about 119M paths I do indeed mean I visit every perft(6) leaf exactly once. But in fact the number of paths that make it all the way to 11 ply is only 62M (because the acceptance probability 1/32 is smaller than the inverse average branching factor). On the 12th ply I don't do any move at all, of course, but I count (and thus have to generate) moves in the 11-ply nodes. But I guess Peter's method has to do that too, to get the factor for that ply.

So it seems my effort is some where between 62M and 119M 12-ply games (the initial 5 plies being highly combined,and thus not really contributing to the effort). Why is that more effort than your generation of 100M games?

Well the point is your visits use a heavier selective part than the rest of us do. If I raise the value of N in your method (Peter decreased it to 1), then obviously I will get lower variance per visits. But the test will be meaningless if it is still based on reduction per visit because you do more work per vist with higher N. Also your old test results took longer when you start from larger depth. Starting from depth=3 it took 54 sec and from depth=6 took 1392 sec (23min). Sure the later will give lower variance but it is majorly due to the higher number of simulations. That is why I insisted on equal number of simulations which ever depth you start selectivity from. For instance it would be interesting to compare the variance for the above two tests by visiting the depth=3 tests by (pert(6) / perft(3)) times more. Both tests should take roughly equal amount of time.

Michel · Post by **Michel** » Sun Aug 07, 2011 6:58 pm

But the test will be meaningless if it is still based on reduction per visit because you do more work per vist with higher N.

As I have been saying all the time: the only correct measure to compare methods is

work*variance

If you halve the variance for twice the work you have gained nothing since you could just as well repeat your original method twice.

The question is: how to measure work? Ideally it should be cpu time but that depends on the particular computer architecture being used. So an approximation would be "nodes visited" or even "leaf nodes visited".

But this is also not ideal since it does not take into account the overhead
associated with calling the move generator (less if you visit more
children per node) or cache access.

hgm · Post by **hgm** » Sun Aug 07, 2011 7:38 pm

Daniel Shawul wrote:OTOH you prune many moves but still could have two or more moves to invesitgate at a ply. At least that is how I understood it. I don't pretend to know all about your method so do care to explain if i misunderstood. So that makes it heavy because it investigates more than one moves at a ply.

You overlook the fact that I can also have zero moves per node,and that that probability is larger than having two or more. On the average I search less than one node, because my acceptance probability is 1/32, and there are on average less than 32 moves per node. So I guess that in your terminology my method is actually lighter than all the others.

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 07, 2011 8:36 pm

Michel wrote:
But the test will be meaningless if it is still based on reduction per visit because you do more work per vist with higher N.
As I have been saying all the time: the only correct measure to compare methods is

work*variance

If you halve the variance for twice the work you have gained nothing since you could just as well repeat your original method twice.

Agreed.

The question is: how to measure work? Ideally it should be cpu time but that depends on the particular computer architecture being used. So an approximation would be "nodes visited" or even "leaf nodes visited".

But this is also not ideal since it does not take into account the overhead
associated with calling the move generator (less if you visit more
children per node) or cache access.

I think with the same type of algorithm fixing number of simulations is a good way. It is similar to fixed nodes test for engine-engine matches. If the algorithms used for the MC simulations are different, comparison should be made by time instead. For that we have to use the _same_ program (Java and c++ perfts can't be compared). For example, no one has yet compared the gain from starting the MC simulation from depth=3 instead of depth=0 by doing same number of simulations. [A hunch] I suspect the gain could be small for a uniform full-width and uniform MC selection. That is if the tree is more or less uniform [/hunch]

Perft(13) betting pool

Re: Defect of a node?

Re: Defect of a node?

Yet another piece of perft(13)

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?

Re: Defect of a node?