Due to title length limitation I have posted an incorrect title.
The correct question is?
Does the ELO error bar convergence speed depends on the time control?
As example, if two engines play at 1'+0" I can see in the first 1000 games a lot of "fluctations" and the speed at wich estimated ELO converges at real ELO is quite low.
Perhaps if I play at 40+40 after 500 games I have already a good ELO estimate of the two engines.
The rationale could be that what it counts to have a realistic idea of engines strenght is the number of nodes/positions elaborated by the two engines when they fight against each other.
If I have tested the two engines for say, 700 million nodes/positions searched, I will have a good estimate of the relative strength indipendently from the time control and this means that I need to play, say 2000 games at 1+0 or just 200 games at 40+40.
This could explain why in CCRL / CEGT we have a good idea of an engine just after few hundreds games while when testing at blitz time you need thousands of games.
Does anybody has ever measured the convergence speed versus the time control used?
Thanks
Marco
Does time control influences ELO error bar?
Moderators: hgm, Rebel, chrisw
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Does time control influences ELO error bar?
I asked a similar question not too long ago. The result is that there is no evidence to suggest that the time control changes the shape of the results distribution.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Does time control influences ELO error bar?
What it does is increase the number of draws, somewhat proportional to the time limit increase. More draws at longer time controls...krazyken wrote:I asked a similar question not too long ago. The result is that there is no evidence to suggest that the time control changes the shape of the results distribution.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Does time control influences ELO error bar?
Could you please post some link on the past discussions?
I am not able to find any info on this subject.
Thanks
Marco
I am not able to find any info on this subject.
Thanks
Marco
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Does time control influences ELO error bar?
My comment is not part of a previous discussion. It is an observation that I see daily.mcostalba wrote:Could you please post some link on the past discussions?
I am not able to find any info on this subject.
Thanks
Marco
Here's some samples from a current test:
Code: Select all
3 Crafty-23.1R02-1 2623 4 4 31128 53% 2601 22%
9 Crafty-23.1R02-3 2587 4 4 31128 48% 2601 26%
12 Crafty-23.1R02-4 2579 4 4 31128 47% 2601 28%
-
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: Does time control influences ELO error bar?
I would think that since signal increases linearly and noise increases with sqrt(games), they would be approach equal given sufficient number of games. Essentially, each successive game offers less information than the one before it, so as games approach infinity, useful information from an additional game approaches zero. Given the increased draw percentage in longer time controls, I'd think there would be a point at which longer games gets more useful information than shorter ones in terms of error bars... I suppose one could calculate it out.
I think there's another issue though. One could think of strength as function of both search effeciency and eval accuracy, but the coeffecients change based on time control. So by changing time control, you're also changing the strength of the engine you're trying to measure...
I'm starting to think some randomness is called for in time control around the one you're testing for, just to eliminate the possibility of strange results caused by time management in the engines you're testing against (or your own). For instance, rather than testing 3 0 games only, throw in some 2 1, 2 2, 1 3, 1 4, 4 0, 3 1, etc.. Odds are the results would end up the same, but if the time it takes is the same anyway, it seems a bit safer.
I think there's another issue though. One could think of strength as function of both search effeciency and eval accuracy, but the coeffecients change based on time control. So by changing time control, you're also changing the strength of the engine you're trying to measure...
I'm starting to think some randomness is called for in time control around the one you're testing for, just to eliminate the possibility of strange results caused by time management in the engines you're testing against (or your own). For instance, rather than testing 3 0 games only, throw in some 2 1, 2 2, 1 3, 1 4, 4 0, 3 1, etc.. Odds are the results would end up the same, but if the time it takes is the same anyway, it seems a bit safer.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Does time control influences ELO error bar?
That's not exactly the correct subject. He asked if time control influenced the error bar, not the overall Elo rating...krazyken wrote:here.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Does time control influences ELO error bar?
Yes, I was asking about convergence speed. To put in simple words:bob wrote:That's not exactly the correct subject. He asked if time control influenced the error bar, not the overall Elo rating...krazyken wrote:here.
If I need 2000 games at 1+0 to verify that engine A is stronger of engine B, how many games are needed if I play at 40+40 ?
This is somewhat different from ELO difference, because can be absolutely possible that at 1+0 engine A results 10 ELO stronger then B and at 40+40 could be 5 ELO points weaker. But this should not influence the fact that I can reliably verify these two different results using as example 2000 games in first case and 400 games in second case.
I was thinking about more draws with longer time control. I am not an expert of statistics, perhaps here someone more versed in mathematics could help, but my feeling is that if with longer time controls draws are more then also fluctations should be less because series of winning games and /or lost games should be statistically less frequent. In other words the fact that draws are more could translate in a less variance for longer games and in a faster convergence. But, again, some mathematician is needed here to confirm this idea.
-
- Posts: 718
- Joined: Fri Mar 20, 2009 8:59 pm
Re: Does time control influences ELO error bar?
Plugging those numbers in and assuming an overall 50% score, and holding testing time constant, you'd get 95% confidence error bars of:
So it seems the increased draw % isn't enough to counteract the significantly fewer games, but the differences do decrease as testing time increases.
Edit: Marco, from what Dr. Muller said, that is indeed the case -- higher draw % reduces error bars more when all else is equal
Testing 51% vs 49% gauntlet score with 0% draws requires about 5000 games.
Testing 51% vs 49% gauntlet score with 35% draws requires about 3250 games.
But if those 3250 games take 2 hours apiece instead of 2 minutes apiece, you'd still hit the confidence interval faster testing 1 minute games...
Code: Select all
32,000 games at 28 seconds/game - 0.49%
8,960 games at 100 seconds/game - 0.91% (+0.42%)
4,480 games at 200 seconds/game - 1.27% (+0.77%)
If we double the test time:
64,000 games at 28 seconds/game - 0.35%
17,920 games at 100 seconds/game - 0.64% (+0.29%)
8,960 games at 200 seconds/game - 0.90% (+0.55%)
Edit: Marco, from what Dr. Muller said, that is indeed the case -- higher draw % reduces error bars more when all else is equal
Testing 51% vs 49% gauntlet score with 0% draws requires about 5000 games.
Testing 51% vs 49% gauntlet score with 35% draws requires about 3250 games.
But if those 3250 games take 2 hours apiece instead of 2 minutes apiece, you'd still hit the confidence interval faster testing 1 minute games...