TT aging

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

crybotmark
Posts: 37
Joined: Thu May 09, 2013 9:06 pm

Re: TT aging

Post by crybotmark »

I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: TT aging

Post by hgm »

I don't understand what you mean by 'bucket size'. If the bucket size is 1, how can you replace the engine with the lowest depth? Isn't there just a single entry, and no choice at all?
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: TT aging

Post by AlvaroBegue »

crybotmark wrote:I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
Yes, there is at least a factor of 4 reduction in the number of games you need to play to get the same error bars, and possibly more because of the effect I described earlier.
crybotmark
Posts: 37
Joined: Thu May 09, 2013 9:06 pm

Re: TT aging

Post by crybotmark »

hgm wrote:I don't understand what you mean by 'bucket size'. If the bucket size is 1, how can you replace the engine with the lowest depth? Isn't there just a single entry, and no choice at all?
In fact you have no choice but to replace that entry. I wrote that because the structure of the TT didn't change; there's still a loop that ranges from 0 to bucketsize. But that was just a lapsus of mine.
AlvaroBegue wrote:
crybotmark wrote:I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
Yes, there is at least a factor of 4 reduction in the number of games you need to play to get the same error bars, and possibly more because of the effect I described earlier.
Thank you, that was some precious information.
crybotmark
Posts: 37
Joined: Thu May 09, 2013 9:06 pm

Re: TT aging

Post by crybotmark »

I'd like to point out that, until now, all my tests did take into account draws (or did not, depending on your point of view)... I observed that just ignoring draws (-X flag with ordo) the elos change dramatically, almost reversing the ranking.
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: TT aging

Post by hgm »

What is your hash size and average nodes per move in these tests?

Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: TT aging

Post by AlvaroBegue »

crybotmark wrote:I'd like to point out that, until now, all my tests did take into account draws (or did not, depending on your point of view)... I observed that just ignoring draws (-X flag with ordo) the elos change dramatically, almost reversing the ranking.
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?
I was only suggesting to ignore draws for the specific purpose of determining with what confidence we can say that A is no weaker than B after playing a number of games between A and B.

I would also like to have a cutechess-cli option to continue playing until a certain number of non-draws has been reached.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: TT aging

Post by AlvaroBegue »

hgm wrote:What is your hash size and average nodes per move in these tests?

Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
[D]8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - -

What does "solve" mean for you? Picking Kb1 with some decent score (say 200 cp)?
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: TT aging

Post by hgm »

Indeed, I think it sees the poin when it sees the way to gain another Pawn. Fairy-Max sees this at 30 ply, but then loses the solution again (so it was probably through hash grafting), and only finds it permanently at 35 ply (57 sec).

Code: Select all

 35	+2.29 	90.7M  	0:57.92	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 34	+0.46 	36.4M  	0:22.34	a1b1 a7b7 b1a2 b7b8 a2b2 b8c8 b2c2 c8b8 c2b1
 33	+0.45 	36.2M  	0:22.20	a1b1 a7b7 b1b2 b7b8
 33	+0.44 	36.2M  	0:22.20	a1b2 a7b8 b2a2 b8c8 a2b3 c8c7 b3a2
 32	+0.86 	32.5M  	0:19.79	a1b2 a7b6 b2c2 b6c7 c2d3 c7b7 d3e2 b7c8 e2d2 c8c7 d2d1
 32	+0.85 	31.6M  	0:19.29	a1b1 a7b7 b1c1 b7c7 c1d2 c7b6 d2d3 b6c7 d3c3 c7b6 c3c4
 31	+2.23 	18.2M  	0:11.07	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 30	+2.18 	17.9M  	0:10.87	a1b1 a7a8 b1b2 a8a7 b2b3 a7b7 b3c3 b7c7 c3d3 c7b6 d3d2 b6b7 d2c1
 30	+0.91 	17.8M  	0:10.82	a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2c3
 29	+0.88 	16.6M  	0:10.03	a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2d3
 28	+0.88 	7.79M  	0:04.59	a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2c2 c8b8 c2d2
 27	+0.88 	3.83M  	0:02.31	a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2c2 c8b8 c2d2
 26	+0.91 	3.17M  	0:01.92	a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2d2 b8c8 d2c3 c8b7 c3c4 b7b6 c4c3
 25	+0.91 	828360	0:00.50	a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2d2 b8c8 d2c3 c8b7
Joker finds it permanently at 24 ply (0.32 sec), although it switches to Kb1 already 3 ply earlier.

Code: Select all

 31	+3.16 	24.4M  	0:20.18	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
 31	+3.16 	24.3M  	0:20.09	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2
 30	+2.96 	17.6M  	0:14.50	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7
 30	+2.96 	17.5M  	0:14.45	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7
 29	+2.82 	7.37M  	0:06.17	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 29	+2.82 	7.35M  	0:06.15	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 28	+2.82 	3.66M  	0:03.09	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
 28	+2.82 	3.61M  	0:03.06	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6 c2c3 b6c7 c3d3
 27	+2.81 	2.49M  	0:02.10	a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7 c2c3 b7a7
 27	+2.81 	2.47M  	0:02.09	* a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
 26	+2.79 	1.10M  	0:00.93	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2 c8d8 d2c3 d8c7 c3d3 c7b7
 26	+2.79 	1.09M  	0:00.93	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
 25	+2.66 	584989	0:00.50	a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
 25	+2.66 	564927	0:00.48	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8 c2d2
 24	+2.66 	383600	0:00.32	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 24	+2.66 	372049	0:00.32	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
 23	+0.53 	266239	0:00.23	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 23	+0.53 	262185	0:00.21	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 22	+0.48 	201474	0:00.17	a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
 22	+0.48 	196305	0:00.17	* a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
crybotmark
Posts: 37
Joined: Thu May 09, 2013 9:06 pm

Re: TT aging

Post by crybotmark »

hgm wrote:What is your hash size and average nodes per move in these tests?

Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
The hash size is 256 MB, but I don't know how would I get the average nodes per move using cutechess.

Never actually tried Fine #70. I have all machines busy right now, but I'll surely post the results as soon as I have them.