Debate: testing at fast time controls

bob · Post by **bob** » Thu Dec 18, 2008 12:38 am

Dirt wrote:
bob wrote:
Dirt wrote:
bob wrote:I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.
At least on my system, Fruit 2.1 wins every time against Crafty 22.8 if the time control is set to 30 seconds for 40 moves, repeating, with no increment.
what kind of book? I am playing from 3891 unique starting positions, taken from white to move, move 16, duplicates removed and most popular positions chosen with single-game positions removed. I play every position twice, alternating colors. This with fruit 2.1 running on linux. I am not sure if there are sub-versions or anything, but there are no opening books used anywhere, no endgame tables, no learning, in so far as it can be turned off..

I've posted my results here many times...
I'm using performance.bin for Fruit, and I think Crafty is using Peter's book, but it doesn't matter. Crafty never uses more than one second for the entire game. If I increase the time to one minute Crafty plays normally. If, as I expect, you never play without increment you won't see this problem.

I play without increment all the time, just not in my testing. One second for the entire game means something is broken in the GUI somewhere. Crafty plays 1 0 games on ICC all the time with absolutely no problems. I play all the way down to game in 1 second for stress-testing...

Can you (a) send me a log file from such a game and (b) send the crafty.rc/.craftyrc file? Something is definitely up that needs looking into. No doubt it would do badly if it uses no time, the question is, what would cause that to happen in your games???

I just tried a couple of games with no increment, repeating time control, and didn't see anything odd at all. First question is, how are you setting the initial clock value to 30 seconds? In looking at the level command, it only supports initial clock time specified in minutes, so that might be the problem in that setting that to zero would be a bit of a problem...

Dirt · Post by **Dirt** » Thu Dec 18, 2008 1:09 am

bob wrote:I play without increment all the time, just not in my testing. One second for the entire game means something is broken in the GUI somewhere. Crafty plays 1 0 games on ICC all the time with absolutely no problems. I play all the way down to game in 1 second for stress-testing...

Can you (a) send me a log file from such a game and (b) send the crafty.rc/.craftyrc file? Something is definitely up that needs looking into. No doubt it would do badly if it uses no time, the question is, what would cause that to happen in your games???

I find one minute, no increment and not repeating works ok for me too. From the log it looks like Crafty thinks a time of 0:30 is zero.

Batch file:

Code: Select all

winboard /mg 1 /cp /tc 0:30 /mps 40 /fd=c:\programs\crafty /sd=c:\programs\polyglot /fcp='crafty "ponder off log"' /scp="polyglot fruit21.ini" /xponder

crafty.rc:

Code: Select all

hash=128M
hashp=16M
log=off
tbpath d:\tb
egtb
cache=16M
ponder off
swindle off
noise 0
#timebook 400 2
#skill 1

Top of log:

Code: Select all

White(1): random
White(1): level 40 0:30 0
40 moves/0 minutes primary time control
40 moves/0 minutes secondary time control
White(1): post
White(1): hard
pondering enabled.
White(1): easy
pondering disabled.
White(1): ping 3
pong 3
White(1): force
White(1): computer
playing a computer!
White(1): name Fruit 2.1
Crafty 22.8 JA vs Fruit
White(1): time 3000
White(1): otim 3000
White(1): c4
Black(1): go
               book moves {Nf6, e5, e6, c5, g6, c6, f5}
               book   0.0s     33%    Nf6 Nc3
Black(1): Nf6
              time used:   0.02
White(2): time 2995
White(2): otim 3000
White(2): d4
              time used:   0.00
               book moves {e6, g6, c5}
               book   0.0s     44%    e6 Nc3
Black(2): e6
              time used:   0.00
White(3): time 2995
White(3): otim 3000
White(3): Nc3
              time used:   0.01
               book moves {Bb4, d5}
               book   0.0s     79%    Bb4 e3
Black(3): Bb4
              time used:   0.00
White(4): time 2995
White(4): otim 3000
White(4): Qc2
              time used:   0.01
               book moves {O-O, c5, d5, Nc6}
               book   0.0s     41%    O-O a3
Black(4): O-O
              time used:   0.00
White(5): time 2990
White(5): otim 3000
White(5): a3
              time used:   0.03
               book moves {Bxc3+}
               book   0.0s     99%    Bxc3+ Qxc3
Black(5): Bxc3+
              time used:   0.02
White(6): time 2990
White(6): otim 3000
White(6): Qxc3
              time used:   0.00
               book moves {b6, d6}
               book   0.0s     77%    b6 Bg5
Black(6): b6
              time used:   0.00

Kirill Kryukov · Post by **Kirill Kryukov** » Thu Dec 18, 2008 2:06 am

bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?

I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.

I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.

The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.

Best,
Kirill

Dann Corbit · Post by **Dann Corbit** » Thu Dec 18, 2008 3:03 am

Kirill Kryukov wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere. One more thing: the engines selected for such tournament should be as close to each other in playing strength as possible.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?

I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
I should have clarified that I did not mean _very_ fast time controls. I can expect any random effects happening at such time controls. I kept in mind CCRL 40/4 time control specifically, and I do believe that results in this control correlate with results in any longer time control very well.

The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

About the comparison based on single pair of engines - yes it would be convincing result for that single pair. But I am more curious if that difference will show up in context of a larger multi-engine comparison. If, for example, 20 engines were taken, and a round-robin tournament was played, for example with 300 circles. This would give each engine 5700 games. Suppose this tournament was repeated in fast (not _very_ fast) time control and long time control (for example, 10 times slower than the fast one). Then if some engine performed 60 ELO points better in one time control than in another one - this would constitute a substantial example.

Best,
Kirill

There is something else to be said for the longer time controls:

If we are going to find improvements and novelties, we are much more likely to discover them at long time control games. Game in one second matches will show which engine is stronger, but who has the stomache to even look at the trash output?

Of course, it's pretty well known that I'm not much of a blitz fan.

Graham Banks · Post by **Graham Banks** » Thu Dec 18, 2008 3:10 am

Dann Corbit wrote: Of course, it's pretty well known that I'm not much of a blitz fan.

You and me both.

swami · Post by **swami** » Thu Dec 18, 2008 4:45 am

Kirill Kryukov wrote: The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times slower than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

Agreed, Kirill.

I have a ratings list from 1/1, and it almost seem to match with many longer time control ratings list out there.

Only exceptions are very few engines (say 5%) that have poor time management and play weaker as a result.

bob · Post by **bob** » Thu Dec 18, 2008 4:49 am

Dirt wrote:

bob wrote:I play without increment all the time, just not in my testing. One second for the entire game means something is broken in the GUI somewhere. Crafty plays 1 0 games on ICC all the time with absolutely no problems. I play all the way down to game in 1 second for stress-testing...

Can you (a) send me a log file from such a game and (b) send the crafty.rc/.craftyrc file? Something is definitely up that needs looking into. No doubt it would do badly if it uses no time, the question is, what would cause that to happen in your games???

I find one minute, no increment and not repeating works ok for me too. From the log it looks like Crafty thinks a time of 0:30 is zero.

Batch file:

Code: Select all

winboard /mg 1 /cp /tc 0:30 /mps 40 /fd=c:\programs\crafty /sd=c:\programs\polyglot /fcp='crafty "ponder off log"' /scp="polyglot fruit21.ini" /xponder

crafty.rc:

Code: Select all

hash=128M
hashp=16M
log=off
tbpath d:\tb
egtb
cache=16M
ponder off
swindle off
noise 0
#timebook 400 2
#skill 1

Top of log:

Code: Select all

White(1): random
White(1): level 40 0:30 0[/quote]


There's the problem.  0:30 was not a part of the winboard protocol when I started.  If it has been added, I can easily make the change, but at the moment, it will certainly not work...



[quote]
40 moves/0 minutes primary time control
40 moves/0 minutes secondary time control
White(1): post
White(1): hard
pondering enabled.
White(1): easy
pondering disabled.
White(1): ping 3
pong 3
White(1): force
White(1): computer
playing a computer!
White(1): name Fruit 2.1
Crafty 22.8 JA vs Fruit
White(1): time 3000
White(1): otim 3000
White(1): c4
Black(1): go
               book moves {Nf6, e5, e6, c5, g6, c6, f5}
               book   0.0s     33%    Nf6 Nc3
Black(1): Nf6
              time used:   0.02
White(2): time 2995
White(2): otim 3000
White(2): d4
              time used:   0.00
               book moves {e6, g6, c5}
               book   0.0s     44%    e6 Nc3
Black(2): e6
              time used:   0.00
White(3): time 2995
White(3): otim 3000
White(3): Nc3
              time used:   0.01
               book moves {Bb4, d5}
               book   0.0s     79%    Bb4 e3
Black(3): Bb4
              time used:   0.00
White(4): time 2995
White(4): otim 3000
White(4): Qc2
              time used:   0.01
               book moves {O-O, c5, d5, Nc6}
               book   0.0s     41%    O-O a3
Black(4): O-O
              time used:   0.00
White(5): time 2990
White(5): otim 3000
White(5): a3
              time used:   0.03
               book moves {Bxc3+}
               book   0.0s     99%    Bxc3+ Qxc3
Black(5): Bxc3+
              time used:   0.02
White(6): time 2990
White(6): otim 3000
White(6): Qxc3
              time used:   0.00
               book moves {b6, d6}
               book   0.0s     77%    b6 Bg5
Black(6): b6
              time used:   0.00

Kirill Kryukov · Post by **Kirill Kryukov** » Thu Dec 18, 2008 5:02 am

Kirill Kryukov wrote:The question probably is which is the fastest time control that correlates well with the long time controls. I would probably say it's CEGT 40/4 (which is about 2.5 times faster than CCRL 40/4), or may be 1+1. Intuitively, as I did not see substantial data showing difference between these time controls and long time controls.

A typo corrected. (slower -> faster).

Dirt · Post by **Dirt** » Thu Dec 18, 2008 5:11 am

bob wrote:There's the problem. 0:30 was not a part of the winboard protocol when I started. If it has been added, I can easily make the change, but at the moment, it will certainly not work...

I haven't looked at the protocol, but that's the only way I know for a user to specify times of under a minute in the Winboard GUI. If I enter a number with a decimal point the fractional part is ignored. When I first used Winboard, back when Tim Mann was still working on it, I first tried something like 0.5 to get 30 seconds, and it didn't work then either. It took me awhile to figure out that I could use minutes:seconds.

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Thu Dec 18, 2008 10:29 am

Graham Banks wrote:
Dann Corbit wrote: Of course, it's pretty well known that I'm not much of a blitz fan.

You and me both.

Count me in

Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls

Re: Debate: testing at fast time controls