TT aging

crybotmark · Post by **crybotmark** » Mon Jul 10, 2017 10:10 pm

I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?

hgm · Post by **hgm** » Mon Jul 10, 2017 10:14 pm

I don't understand what you mean by 'bucket size'. If the bucket size is 1, how can you replace the engine with the lowest depth? Isn't there just a single entry, and no choice at all?

AlvaroBegue · Post by **AlvaroBegue** » Mon Jul 10, 2017 10:17 pm

crybotmark wrote:I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?

Yes, there is at least a factor of 4 reduction in the number of games you need to play to get the same error bars, and possibly more because of the effect I described earlier.

crybotmark · Post by **crybotmark** » Mon Jul 10, 2017 10:21 pm

hgm wrote:I don't understand what you mean by 'bucket size'. If the bucket size is 1, how can you replace the engine with the lowest depth? Isn't there just a single entry, and no choice at all?

In fact you have no choice but to replace that entry. I wrote that because the structure of the TT didn't change; there's still a loop that ranges from 0 to bucketsize. But that was just a lapsus of mine.

AlvaroBegue wrote:
crybotmark wrote:I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
Yes, there is at least a factor of 4 reduction in the number of games you need to play to get the same error bars, and possibly more because of the effect I described earlier.

Thank you, that was some precious information.

crybotmark · Post by **crybotmark** » Tue Jul 11, 2017 6:01 pm

I'd like to point out that, until now, all my tests did take into account draws (or did not, depending on your point of view)... I observed that just ignoring draws (-X flag with ordo) the elos change dramatically, almost reversing the ranking.
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?

hgm · Post by **hgm** » Tue Jul 11, 2017 7:25 pm

What is your hash size and average nodes per move in these tests?

Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.

AlvaroBegue · Post by **AlvaroBegue** » Tue Jul 11, 2017 8:15 pm

crybotmark wrote:I'd like to point out that, until now, all my tests did take into account draws (or did not, depending on your point of view)... I observed that just ignoring draws (-X flag with ordo) the elos change dramatically, almost reversing the ranking.
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?

I was only suggesting to ignore draws for the specific purpose of determining with what confidence we can say that A is no weaker than B after playing a number of games between A and B.

I would also like to have a cutechess-cli option to continue playing until a certain number of non-draws has been reached.

AlvaroBegue · Post by **AlvaroBegue** » Tue Jul 11, 2017 8:16 pm

hgm wrote:What is your hash size and average nodes per move in these tests?

Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.

[D]8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - -

What does "solve" mean for you? Picking Kb1 with some decent score (say 200 cp)?

hgm · Post by **hgm** » Tue Jul 11, 2017 8:36 pm

Indeed, I think it sees the poin when it sees the way to gain another Pawn. Fairy-Max sees this at 30 ply, but then loses the solution again (so it was probably through hash grafting), and only finds it permanently at 35 ply (57 sec).

Code: Select all

 35	+2.29 	90.7M  	0&#58;57.92	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 34	+0.46 	36.4M  	0&#58;22.34	a1b1 a7b7 b1a2 b7b8 a2b2 b8c8 b2c2 c8b8 c2b1
 33	+0.45 	36.2M  	0&#58;22.20	a1b1 a7b7 b1b2 b7b8
 33	+0.44 	36.2M  	0&#58;22.20	a1b2 a7b8 b2a2 b8c8 a2b3 c8c7 b3a2
 32	+0.86 	32.5M  	0&#58;19.79	a1b2 a7b6 b2c2 b6c7 c2d3 c7b7 d3e2 b7c8 e2d2 c8c7 d2d1
 32	+0.85 	31.6M  	0&#58;19.29	a1b1 a7b7 b1c1 b7c7 c1d2 c7b6 d2d3 b6c7 d3c3 c7b6 c3c4
 31	+2.23 	18.2M  	0&#58;11.07	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 30	+2.18 	17.9M  	0&#58;10.87	a1b1 a7a8 b1b2 a8a7 b2b3 a7b7 b3c3 b7c7 c3d3 c7b6 d3d2 b6b7 d2c1
 30	+0.91 	17.8M  	0&#58;10.82	a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2c3
 29	+0.88 	16.6M  	0&#58;10.03	a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2d3
 28	+0.88 	7.79M  	0&#58;04.59	a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2c2 c8b8 c2d2
 27	+0.88 	3.83M  	0&#58;02.31	a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2c2 c8b8 c2d2
 26	+0.91 	3.17M  	0&#58;01.92	a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2d2 b8c8 d2c3 c8b7 c3c4 b7b6 c4c3
 25	+0.91 	828360	0&#58;00.50	a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2d2 b8c8 d2c3 c8b7

Joker finds it permanently at 24 ply (0.32 sec), although it switches to Kb1 already 3 ply earlier.

Code: Select all

 31	+3.16 	24.4M  	0&#58;20.18	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
 31	+3.16 	24.3M  	0&#58;20.09	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2
 30	+2.96 	17.6M  	0&#58;14.50	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7
 30	+2.96 	17.5M  	0&#58;14.45	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7
 29	+2.82 	7.37M  	0&#58;06.17	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 29	+2.82 	7.35M  	0&#58;06.15	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 28	+2.82 	3.66M  	0&#58;03.09	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
 28	+2.82 	3.61M  	0&#58;03.06	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6 c2c3 b6c7 c3d3
 27	+2.81 	2.49M  	0&#58;02.10	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7 c2c3 b7a7
 27	+2.81 	2.47M  	0&#58;02.09	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
 26	+2.79 	1.10M  	0&#58;00.93	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2 c8d8 d2c3 d8c7 c3d3 c7b7
 26	+2.79 	1.09M  	0&#58;00.93	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
 25	+2.66 	584989	0&#58;00.50	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 25	+2.66 	564927	0&#58;00.48	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8 c2d2
 24	+2.66 	383600	0&#58;00.32	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 24	+2.66 	372049	0&#58;00.32	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
 23	+0.53 	266239	0&#58;00.23	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 23	+0.53 	262185	0&#58;00.21	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 22	+0.48 	201474	0&#58;00.17	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 22	+0.48 	196305	0&#58;00.17	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2

crybotmark · Post by **crybotmark** » Tue Jul 11, 2017 9:10 pm

hgm wrote:What is your hash size and average nodes per move in these tests?

Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.

The hash size is 256 MB, but I don't know how would I get the average nodes per move using cutechess.

Never actually tried Fine #70. I have all machines busy right now, but I'll surely post the results as soon as I have them.

TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging

Re: TT aging