TT aging
Moderators: hgm, Rebel, chrisw
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT aging
I don't understand what you mean by 'bucket size'. If the bucket size is 1, how can you replace the engine with the lowest depth? Isn't there just a single entry, and no choice at all?
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: TT aging
Yes, there is at least a factor of 4 reduction in the number of games you need to play to get the same error bars, and possibly more because of the effect I described earlier.crybotmark wrote:I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
In fact you have no choice but to replace that entry. I wrote that because the structure of the TT didn't change; there's still a loop that ranges from 0 to bucketsize. But that was just a lapsus of mine.hgm wrote:I don't understand what you mean by 'bucket size'. If the bucket size is 1, how can you replace the engine with the lowest depth? Isn't there just a single entry, and no choice at all?
Thank you, that was some precious information.AlvaroBegue wrote:Yes, there is at least a factor of 4 reduction in the number of games you need to play to get the same error bars, and possibly more because of the effect I described earlier.crybotmark wrote:I see, that seems a reasonable statement. But what's the advantage? does self playing requires less games to state, with statistical evidence, that a change improves strength?
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
I'd like to point out that, until now, all my tests did take into account draws (or did not, depending on your point of view)... I observed that just ignoring draws (-X flag with ordo) the elos change dramatically, almost reversing the ranking.
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT aging
What is your hash size and average nodes per move in these tests?
Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: TT aging
I was only suggesting to ignore draws for the specific purpose of determining with what confidence we can say that A is no weaker than B after playing a number of games between A and B.crybotmark wrote:I'd like to point out that, until now, all my tests did take into account draws (or did not, depending on your point of view)... I observed that just ignoring draws (-X flag with ordo) the elos change dramatically, almost reversing the ranking.
My question is then: should I ignore draws also during gauntlet tests (as for selfgames as pointed out by Alvaro)? And if so, I couldn't find any flag in cutechess to impose a fixed number of non-draw games to be played, anyone could help?
I would also like to have a cutechess-cli option to continue playing until a certain number of non-draws has been reached.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: TT aging
[D]8/k7/3p4/p2P1p2/P2P1P2/8/8/K7 w - -hgm wrote:What is your hash size and average nodes per move in these tests?
Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
What does "solve" mean for you? Picking Kb1 with some decent score (say 200 cp)?
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT aging
Indeed, I think it sees the poin when it sees the way to gain another Pawn. Fairy-Max sees this at 30 ply, but then loses the solution again (so it was probably through hash grafting), and only finds it permanently at 35 ply (57 sec).
Joker finds it permanently at 24 ply (0.32 sec), although it switches to Kb1 already 3 ply earlier.
Code: Select all
35 +2.29 90.7M 0:57.92 a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
34 +0.46 36.4M 0:22.34 a1b1 a7b7 b1a2 b7b8 a2b2 b8c8 b2c2 c8b8 c2b1
33 +0.45 36.2M 0:22.20 a1b1 a7b7 b1b2 b7b8
33 +0.44 36.2M 0:22.20 a1b2 a7b8 b2a2 b8c8 a2b3 c8c7 b3a2
32 +0.86 32.5M 0:19.79 a1b2 a7b6 b2c2 b6c7 c2d3 c7b7 d3e2 b7c8 e2d2 c8c7 d2d1
32 +0.85 31.6M 0:19.29 a1b1 a7b7 b1c1 b7c7 c1d2 c7b6 d2d3 b6c7 d3c3 c7b6 c3c4
31 +2.23 18.2M 0:11.07 a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
30 +2.18 17.9M 0:10.87 a1b1 a7a8 b1b2 a8a7 b2b3 a7b7 b3c3 b7c7 c3d3 c7b6 d3d2 b6b7 d2c1
30 +0.91 17.8M 0:10.82 a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2c3
29 +0.88 16.6M 0:10.03 a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2d3
28 +0.88 7.79M 0:04.59 a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2c2 c8b8 c2d2
27 +0.88 3.83M 0:02.31 a1b2 a7b8 b2c3 b8b7 c3d2 b7c8 d2c2 c8b8 c2d2
26 +0.91 3.17M 0:01.92 a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2d2 b8c8 d2c3 c8b7 c3c4 b7b6 c4c3
25 +0.91 828360 0:00.50 a1b2 a7a8 b2b3 a8a7 b3c2 a7b8 c2d2 b8c8 d2c3 c8b7
Code: Select all
31 +3.16 24.4M 0:20.18 a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
31 +3.16 24.3M 0:20.09 * a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2
30 +2.96 17.6M 0:14.50 a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7
30 +2.96 17.5M 0:14.45 * a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7
29 +2.82 7.37M 0:06.17 a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
29 +2.82 7.35M 0:06.15 * a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
28 +2.82 3.66M 0:03.09 a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
28 +2.82 3.61M 0:03.06 * a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6 c2c3 b6c7 c3d3
27 +2.81 2.49M 0:02.10 a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b7 c2c3 b7a7
27 +2.81 2.47M 0:02.09 * a1b1 a7a8 b1b2 a8a7 b2b3 a7a6 b3c2 a6b6
26 +2.79 1.10M 0:00.93 a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2 c8d8 d2c3 d8c7 c3d3 c7b7
26 +2.79 1.09M 0:00.93 * a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
25 +2.66 584989 0:00.50 a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8
25 +2.66 564927 0:00.48 * a1b1 a7b7 b1c1 b7c7 c1d1 c7d7 d1c2 d7c8 c2d2
24 +2.66 383600 0:00.32 a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
24 +2.66 372049 0:00.32 * a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
23 +0.53 266239 0:00.23 a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
23 +0.53 262185 0:00.21 * a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
22 +0.48 201474 0:00.17 a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8
22 +0.48 196305 0:00.17 * a1b1 a7b7 b1c1 b7c7 c1d1 c7d8 d1c2 d8c8 c2d2
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
The hash size is 256 MB, but I don't know how would I get the average nodes per move using cutechess.hgm wrote:What is your hash size and average nodes per move in these tests?
Have you tried how long it takes to solve Fine #70 with the various replacement schemes? That is usually a very sensitive test for how well your TT works.
Never actually tried Fine #70. I have all machines busy right now, but I'll surely post the results as soon as I have them.