bob wrote:Uri Blass wrote:jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.
But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."
My fast estimate is more than 5 elo based on the following calculation(changes 10-15 that I mentioned in previous post are changes after Stockfish DD and of course there were also changes earlier to stockfish DD that I do not include):
change 10 gives 1 elo
change 11 gives 3 elo
change 12 gives 0 elo(get rid of easy move)
change 13 gives 1 elo
change 14 gives 1 elo
change 15 gives 2 elo
Total elo advantage 8 elo rating points
from time management changes.
None of those elo gains are real. To get to +/- 1 elo error bar you need 170K games which they are not using.
Examples from my testing:
1 Stockfish 1.8 64bit 2810 1 1 171200
error bar reached +/- 1 somewhere after 170,000 where error bar was +/- 2...
I did not claim that I have +-1 error in my estimate and my estimate was slightly passimistic relative to the performance in the tests because I know that SPRT tend to get slightly too optimistic value when it pass.
For example
I translated change 14 and change 15 to 1 elo and 2 elo improvement respectively when practically the improvement may be also more than 1 and 2 elo.
Here are the test results of 14 and 15 again:
14)Tested with simplification mode SPRT[-4, 0]
Passed both short TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 34102 W: 6184 L: 6144 D: 21774
And long TC
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 16518 W: 2647 L: 2545 D: 11326
And also 40/10 TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 22406 W: 4390 L: 4312 D: 13704
15) Passed both short TC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 18907 W: 3475 L: 3322 D: 12110
And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 19044 W: 2997 L: 2811 D: 13236
bench: 8430785