Debate: testing at fast time controls
Moderator: Ras
-
- Posts: 620
- Joined: Fri Feb 08, 2008 10:44 am
- Location: Madrid - Spain
Re: Debate: testing at fast time controls
And when you repeat tests in fast time controls, do you see a repetition in the result?. I have noted that repeting tests in fast time controls, the result change more than repeting in slows. Have you noted something similar?
-
- Posts: 518
- Joined: Sun Mar 19, 2006 4:12 am
- Full name: Kirill Kryukov
Re: Debate: testing at fast time controls
Dear Dann,Dann Corbit wrote:There is something else to be said for the longer time controls:
If we are going to find improvements and novelties, we are much more likely to discover them at long time control games. Game in one second matches will show which engine is stronger, but who has the stomache to even look at the trash output?
Of course, it's pretty well known that I'm not much of a blitz fan.
This is all fair and true. I think it's important to clarify the objectives before discussing the means. When I run engine-engine tournaments - I am not looking to advance chess theory, I am not looking for improvements or novelties. I am simply trying to evaluate how strong different engines are, under my specific testing conditions.
When I analyze a position, or play a correspondance game, I am likely to let an engine think for hours or sometimes days, but it's totally different from running engine-engine matches.
Do you seriously look for useful novelties in engine-engine games, even in long time controls?
All the best,
Kirill
-
- Posts: 10895
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Debate: testing at fast time controls
I find difference of more than 60 elo for the baron1.7.0 and it may be interesting to find all the programs when there is a difference of more than 60 elo.Kirill Kryukov wrote:I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.bob wrote:Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?Kirill Kryukov wrote:I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?
If not, what do you (assuming you test in fast games) do to take care of normal games?
Best,
Kirill
I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.
About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.
Best,
Kirill
The baron1.7.0 did not play enough games at 40/40 so the fact that the difference is bigger than 60 elo may be because of a statistical error.
ccrl 40/4
The Baron 1.7.0 2560 +16 −16 47.0% +17.0 23.2% 1631
ccrl 40/40
The Baron 1.7.0 2648 +33 −33 47.8% +13.8 30.6% 301
-
- Posts: 9773
- Joined: Wed Mar 08, 2006 8:44 pm
- Location: Amman,Jordan
Re: Debate: testing at fast time controls
no,it's not a statistical error,actualy Baron and Delfi are good examples of many engines that benefits from long time controls....Uri Blass wrote:I find difference of more than 60 elo for the baron1.7.0 and it may be interesting to find all the programs when there is a difference of more than 60 elo.Kirill Kryukov wrote:I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.bob wrote:Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?Kirill Kryukov wrote:I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?
If not, what do you (assuming you test in fast games) do to take care of normal games?
Best,
Kirill
I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.
About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.
Best,
Kirill
The baron1.7.0 did not play enough games at 40/40 so the fact that the difference is bigger than 60 elo may be because of a statistical error.
ccrl 40/4
The Baron 1.7.0 2560 +16 −16 47.0% +17.0 23.2% 1631
ccrl 40/40
The Baron 1.7.0 2648 +33 −33 47.8% +13.8 30.6% 301
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
-
- Posts: 10895
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Debate: testing at fast time controls
Baron earns from long time control butDr.Wael Deeb wrote:no,it's not a statistical error,actualy Baron and Delfi are good examples of many engines that benefits from long time controls....Uri Blass wrote:I find difference of more than 60 elo for the baron1.7.0 and it may be interesting to find all the programs when there is a difference of more than 60 elo.Kirill Kryukov wrote:I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.bob wrote:Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?Kirill Kryukov wrote:I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?
If not, what do you (assuming you test in fast games) do to take care of normal games?
Best,
Kirill
I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.
About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.
Best,
Kirill
The baron1.7.0 did not play enough games at 40/40 so the fact that the difference is bigger than 60 elo may be because of a statistical error.
ccrl 40/4
The Baron 1.7.0 2560 +16 −16 47.0% +17.0 23.2% 1631
ccrl 40/40
The Baron 1.7.0 2648 +33 −33 47.8% +13.8 30.6% 301
it still may be a statistical error because the real difference may be less than 60 elo and note that the difference for later versions of the baron is smaller than 60 elo.
When I look at the CCRL rating of different versions of Delfi
I find only small difference of less than 30 elo for long time control.
Uri
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Debate: testing at fast time controls
the oldish engine interface document I have doesn't mention 0:30. I think I would _prefer_ the 0.5 minute approach rather than 0:30 as there is an ambiguity there unless it is extended to hh:mm:ss instead. You can't set a real chess clock to fractions of a minute, which is probably what led to the minute unit rather than seconds, although we have had digital chess clocks for something like 25 years now...Dirt wrote:I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...
I'll make crafty accept that, but it would help if I knew exactly what the options are. For example is mm:ss or mm the only two choices, or is hh:mm:ss also an option?
Re: Debate: testing at fast time controls
This page suggests that it is mm[:ss] only.bob wrote:the oldish engine interface document I have doesn't mention 0:30. I think I would _prefer_ the 0.5 minute approach rather than 0:30 as there is an ambiguity there unless it is extended to hh:mm:ss instead. You can't set a real chess clock to fractions of a minute, which is probably what led to the minute unit rather than seconds, although we have had digital chess clocks for something like 25 years now...Dirt wrote:I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...
I'll make crafty accept that, but it would help if I knew exactly what the options are. For example is mm:ss or mm the only two choices, or is hh:mm:ss also an option?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Debate: testing at fast time controls
That I can fix...krazyken wrote:This page suggests that it is mm[:ss] only.bob wrote:the oldish engine interface document I have doesn't mention 0:30. I think I would _prefer_ the 0.5 minute approach rather than 0:30 as there is an ambiguity there unless it is extended to hh:mm:ss instead. You can't set a real chess clock to fractions of a minute, which is probably what led to the minute unit rather than seconds, although we have had digital chess clocks for something like 25 years now...Dirt wrote:I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...
I'll make crafty accept that, but it would help if I knew exactly what the options are. For example is mm:ss or mm the only two choices, or is hh:mm:ss also an option?
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Debate: testing at fast time controls
That only means you are not playing enough games. To get to the +/- 4 Elo level, you need to play 40,000 games or so. and +/- 4 gives a significant margin for error even with that many games...Kempelen wrote:And when you repeat tests in fast time controls, do you see a repetition in the result?. I have noted that repeting tests in fast time controls, the result change more than repeting in slows. Have you noted something similar?
-
- Posts: 12792
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: Debate: testing at fast time controls
Yes.Kirill Kryukov wrote: {snip}
Do you seriously look for useful novelties in engine-engine games, even in long time controls?
Found some, too.