Time Management

Uri Blass · Post by **Uri Blass** » Fri May 30, 2014 11:11 pm

jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.

But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."

My fast estimate is more than 5 elo based on the following calculation(changes 10-15 that I mentioned in previous post are changes after Stockfish DD and of course there were also changes earlier to stockfish DD that I do not include):

change 10 gives 1 elo
change 11 gives 3 elo
change 12 gives 0 elo(get rid of easy move)
change 13 gives 1 elo
change 14 gives 1 elo
change 15 gives 2 elo

Total elo advantage 8 elo rating points
from time management changes.

bob · Post by **bob** » Fri May 30, 2014 11:37 pm

syzygy wrote:
bob wrote:
jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.

But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."
I think 5 Elo is the UPPER bound on TM gains. When Don and I were talking, I was testing many ideas for both of us. And NONE of them varied by more than 5 Elo, unless I introduced a bug by accident.
The upper bound will depend on your starting point. There is certainly no lower bound on TM loss.

Stating that for an arbitrary engine there is at most 5 Elo to gain from improving TM makes little sense imho. But obviously for any engine there is a limit to what one can extract from TM improvements.

I'm talking about a decent working TM. Which stockfish certainly has had for years.

Wish there was a magic +10 or +20 Elo idea, but I've not seen one. Don and I tested literally dozens of different ideas, most sounded good but were no better than default.

bob · Post by **bob** » Fri May 30, 2014 11:44 pm

Uri Blass wrote:
jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.

But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."
My fast estimate is more than 5 elo based on the following calculation(changes 10-15 that I mentioned in previous post are changes after Stockfish DD and of course there were also changes earlier to stockfish DD that I do not include):

change 10 gives 1 elo
change 11 gives 3 elo
change 12 gives 0 elo(get rid of easy move)
change 13 gives 1 elo
change 14 gives 1 elo
change 15 gives 2 elo

Total elo advantage 8 elo rating points
from time management changes.

None of those elo gains are real. To get to +/- 1 elo error bar you need 170K games which they are not using.

Examples from my testing:

1 Stockfish 1.8 64bit 2810 1 1 171200

error bar reached +/- 1 somewhere after 170,000 where error bar was +/- 2...

Uri Blass · Post by **Uri Blass** » Fri May 30, 2014 11:50 pm

bob wrote:
syzygy wrote:
bob wrote:
jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.

But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."
I think 5 Elo is the UPPER bound on TM gains. When Don and I were talking, I was testing many ideas for both of us. And NONE of them varied by more than 5 Elo, unless I introduced a bug by accident.
The upper bound will depend on your starting point. There is certainly no lower bound on TM loss.

Stating that for an arbitrary engine there is at most 5 Elo to gain from improving TM makes little sense imho. But obviously for any engine there is a limit to what one can extract from TM improvements.
I'm talking about a decent working TM. Which stockfish certainly has had for years.

Wish there was a magic +10 or +20 Elo idea, but I've not seen one. Don and I tested literally dozens of different ideas, most sounded good but were no better than default.

Note that all the testing in stockfish are for ponder off games
and TCEC used ponder off.

The advantage may be smaller with ponder on but I think that at least for ponder off games stockfish earned more than +10 elo from time management improvements in 2013-2014(from stockfish3 to latest version).

Uri Blass · Post by **Uri Blass** » Sat May 31, 2014 12:10 am

bob wrote:
Uri Blass wrote:
jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.

But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."
My fast estimate is more than 5 elo based on the following calculation(changes 10-15 that I mentioned in previous post are changes after Stockfish DD and of course there were also changes earlier to stockfish DD that I do not include):

change 10 gives 1 elo
change 11 gives 3 elo
change 12 gives 0 elo(get rid of easy move)
change 13 gives 1 elo
change 14 gives 1 elo
change 15 gives 2 elo

Total elo advantage 8 elo rating points
from time management changes.
None of those elo gains are real. To get to +/- 1 elo error bar you need 170K games which they are not using.

Examples from my testing:

1 Stockfish 1.8 64bit 2810 1 1 171200

error bar reached +/- 1 somewhere after 170,000 where error bar was +/- 2...

I did not claim that I have +-1 error in my estimate and my estimate was slightly passimistic relative to the performance in the tests because I know that SPRT tend to get slightly too optimistic value when it pass.

For example
I translated change 14 and change 15 to 1 elo and 2 elo improvement respectively when practically the improvement may be also more than 1 and 2 elo.

Here are the test results of 14 and 15 again:

14)Tested with simplification mode SPRT[-4, 0]

Passed both short TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 34102 W: 6184 L: 6144 D: 21774

And long TC
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 16518 W: 2647 L: 2545 D: 11326

And also 40/10 TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 22406 W: 4390 L: 4312 D: 13704

15) Passed both short TC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 18907 W: 3475 L: 3322 D: 12110

And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 19044 W: 2997 L: 2811 D: 13236

bench: 8430785

bob · Post by **bob** » Sat May 31, 2014 2:03 am

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
jhellis3 wrote:Well, the gain from SF DD is ~50 - 55 Elo. And the cumulative gains from better TM are ~5 Elo, IIRC. That is 10%.

But, yeah, writing it as 10% certainly makes it sound more impressive than "5 Elo."
My fast estimate is more than 5 elo based on the following calculation(changes 10-15 that I mentioned in previous post are changes after Stockfish DD and of course there were also changes earlier to stockfish DD that I do not include):

change 10 gives 1 elo
change 11 gives 3 elo
change 12 gives 0 elo(get rid of easy move)
change 13 gives 1 elo
change 14 gives 1 elo
change 15 gives 2 elo

Total elo advantage 8 elo rating points
from time management changes.
None of those elo gains are real. To get to +/- 1 elo error bar you need 170K games which they are not using.

Examples from my testing:

1 Stockfish 1.8 64bit 2810 1 1 171200

error bar reached +/- 1 somewhere after 170,000 where error bar was +/- 2...
I did not claim that I have +-1 error in my estimate and my estimate was slightly passimistic relative to the performance in the tests because I know that SPRT tend to get slightly too optimistic value when it pass.

For example
I translated change 14 and change 15 to 1 elo and 2 elo improvement respectively when practically the improvement may be also more than 1 and 2 elo.

Here are the test results of 14 and 15 again:

14)Tested with simplification mode SPRT[-4, 0]

Passed both short TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 34102 W: 6184 L: 6144 D: 21774

And long TC
LLR: 2.96 (-2.94,2.94) [-4.00,0.00]
Total: 16518 W: 2647 L: 2545 D: 11326

And also 40/10 TC
LLR: 2.95 (-2.94,2.94) [-4.00,0.00]
Total: 22406 W: 4390 L: 4312 D: 13704

15) Passed both short TC:
LLR: 2.97 (-2.94,2.94) [-1.50,4.50]
Total: 18907 W: 3475 L: 3322 D: 12110

And long TC:
LLR: 2.96 (-2.94,2.94) [0.00,6.00]
Total: 19044 W: 2997 L: 2811 D: 13236

bench: 8430785

The problem I have with your conclusions is that ALL of the above have an error bar of +/-4 or MORE. Running the same test a second time will likely produce a significantly different Elo number...

It is REALLY easy to take those numbers as absolute, but that can lead to really bad conclusions. From experience.

Time Management

Re: Time Management

Re: Time Management

Re: Time Management

Re: Time Management

Re: Time Management

Re: Time Management