Debate: testing at fast time controls

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
Kempelen
Posts: 620
Joined: Fri Feb 08, 2008 10:44 am
Location: Madrid - Spain

Re: Debate: testing at fast time controls

Post by Kempelen »

And when you repeat tests in fast time controls, do you see a repetition in the result?. I have noted that repeting tests in fast time controls, the result change more than repeting in slows. Have you noted something similar?
Fermin Serrano
Author of 'Rodin' engine
http://sites.google.com/site/clonfsp/
User avatar
Kirill Kryukov
Posts: 518
Joined: Sun Mar 19, 2006 4:12 am
Full name: Kirill Kryukov

Re: Debate: testing at fast time controls

Post by Kirill Kryukov »

Dann Corbit wrote:There is something else to be said for the longer time controls:

If we are going to find improvements and novelties, we are much more likely to discover them at long time control games. Game in one second matches will show which engine is stronger, but who has the stomache to even look at the trash output?

Of course, it's pretty well known that I'm not much of a blitz fan.
:roll:
Dear Dann,

This is all fair and true. I think it's important to clarify the objectives before discussing the means. When I run engine-engine tournaments - I am not looking to advance chess theory, I am not looking for improvements or novelties. I am simply trying to evaluate how strong different engines are, under my specific testing conditions.

When I analyze a position, or play a correspondance game, I am likely to let an engine think for hours or sometimes days, but it's totally different from running engine-engine matches.

Do you seriously look for useful novelties in engine-engine games, even in long time controls?

All the best,
Kirill
Uri Blass
Posts: 10895
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Debate: testing at fast time controls

Post by Uri Blass »

Kirill Kryukov wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?

I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.

The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.

Best,
Kirill
I find difference of more than 60 elo for the baron1.7.0 and it may be interesting to find all the programs when there is a difference of more than 60 elo.
The baron1.7.0 did not play enough games at 40/40 so the fact that the difference is bigger than 60 elo may be because of a statistical error.

ccrl 40/4
The Baron 1.7.0 2560 +16 −16 47.0% +17.0 23.2% 1631


ccrl 40/40
The Baron 1.7.0 2648 +33 −33 47.8% +13.8 30.6% 301
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: Debate: testing at fast time controls

Post by Dr.Wael Deeb »

Uri Blass wrote:
Kirill Kryukov wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?

I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.

The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.

Best,
Kirill
I find difference of more than 60 elo for the baron1.7.0 and it may be interesting to find all the programs when there is a difference of more than 60 elo.
The baron1.7.0 did not play enough games at 40/40 so the fact that the difference is bigger than 60 elo may be because of a statistical error.

ccrl 40/4
The Baron 1.7.0 2560 +16 −16 47.0% +17.0 23.2% 1631


ccrl 40/40
The Baron 1.7.0 2648 +33 −33 47.8% +13.8 30.6% 301
no,it's not a statistical error,actualy Baron and Delfi are good examples of many engines that benefits from long time controls....
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….
Uri Blass
Posts: 10895
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Debate: testing at fast time controls

Post by Uri Blass »

Dr.Wael Deeb wrote:
Uri Blass wrote:
Kirill Kryukov wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?

I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.

The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.

Best,
Kirill
I find difference of more than 60 elo for the baron1.7.0 and it may be interesting to find all the programs when there is a difference of more than 60 elo.
The baron1.7.0 did not play enough games at 40/40 so the fact that the difference is bigger than 60 elo may be because of a statistical error.

ccrl 40/4
The Baron 1.7.0 2560 +16 −16 47.0% +17.0 23.2% 1631


ccrl 40/40
The Baron 1.7.0 2648 +33 −33 47.8% +13.8 30.6% 301
no,it's not a statistical error,actualy Baron and Delfi are good examples of many engines that benefits from long time controls....
Baron earns from long time control but
it still may be a statistical error because the real difference may be less than 60 elo and note that the difference for later versions of the baron is smaller than 60 elo.

When I look at the CCRL rating of different versions of Delfi
I find only small difference of less than 30 elo for long time control.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

Dirt wrote:
bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...
I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.
the oldish engine interface document I have doesn't mention 0:30. I think I would _prefer_ the 0.5 minute approach rather than 0:30 as there is an ambiguity there unless it is extended to hh:mm:ss instead. You can't set a real chess clock to fractions of a minute, which is probably what led to the minute unit rather than seconds, although we have had digital chess clocks for something like 25 years now...

I'll make crafty accept that, but it would help if I knew exactly what the options are. For example is mm:ss or mm the only two choices, or is hh:mm:ss also an option?
krazyken

Re: Debate: testing at fast time controls

Post by krazyken »

bob wrote:
Dirt wrote:
bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...
I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.
the oldish engine interface document I have doesn't mention 0:30. I think I would _prefer_ the 0.5 minute approach rather than 0:30 as there is an ambiguity there unless it is extended to hh:mm:ss instead. You can't set a real chess clock to fractions of a minute, which is probably what led to the minute unit rather than seconds, although we have had digital chess clocks for something like 25 years now...

I'll make crafty accept that, but it would help if I knew exactly what the options are. For example is mm:ss or mm the only two choices, or is hh:mm:ss also an option?
This page suggests that it is mm[:ss] only.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

krazyken wrote:
bob wrote:
Dirt wrote:
bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...
I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.
the oldish engine interface document I have doesn't mention 0:30. I think I would _prefer_ the 0.5 minute approach rather than 0:30 as there is an ambiguity there unless it is extended to hh:mm:ss instead. You can't set a real chess clock to fractions of a minute, which is probably what led to the minute unit rather than seconds, although we have had digital chess clocks for something like 25 years now...

I'll make crafty accept that, but it would help if I knew exactly what the options are. For example is mm:ss or mm the only two choices, or is hh:mm:ss also an option?
This page suggests that it is mm[:ss] only.
That I can fix...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

Kempelen wrote:And when you repeat tests in fast time controls, do you see a repetition in the result?. I have noted that repeting tests in fast time controls, the result change more than repeting in slows. Have you noted something similar?
That only means you are not playing enough games. To get to the +/- 4 Elo level, you need to play 40,000 games or so. and +/- 4 gives a significant margin for error even with that many games...
Dann Corbit
Posts: 12792
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Debate: testing at fast time controls

Post by Dann Corbit »

Kirill Kryukov wrote: {snip}
Do you seriously look for useful novelties in engine-engine games, even in long time controls?
Yes.
Found some, too.