Debate: testing at fast time controls

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

hgm wrote:I think one of the problems is that techniques like null-move pruning lead to more and more 'unnatural' positions as the depth increases. Preferentially the engines try to refute everything through null moves, which means that the side playing the cut-nodes gets too lazy to cash in on any winning captures he can do, while the side playing the all-nodes plays nonsense most of the time, and in most branches simply does not get to cashing in on good captures. So the number of good captures on both sides tends to increase with depth, and this puts a heavier and heavier butdon on the implementation of QS. In very fast games an inefficient QS probably hurts much less than at long TC, where you really have to select carefully which one of a variety of captures you are going to try first.
There are also more subtle issues in play. For example, at fast time controls, depth is all-important. Which means speed is all-important as well. Tiny changes to the code can make a significant change to the program's speed, and can actually cause you to win more games at very fast time controls because you are a tiny bit faster, while at longer time controls the tiny speedup is not important and the eval term actually begins to exert more influence on the outcome and the result can be worse.

This is a dangerous part of testing in that there are lots of unknown-to-most-users performance issues that are completely random (memory placement is a biggie with cache-line aliasing issues being very important) that can affect very fast games far more than they affect long games...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?

I won't claim whether Crafty gets worse with longer time controls, or more likely, fruit gets worse at faster time controls. But that is an absolute statement of fact that has been measured by me on my cluster. I can take this to the extreme where Fruit hardly wins a single game, losing most on time, to give you a hint of what is going on. Time utilization is very important in very fast games, less so in longer games.. That is just one difference in the two programs.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

F. Bluemers wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere.

Best,
Kirill
What's it about is this:
Can we tune an engine with long time controls and asume it would do fine on short tc's as well and vice versa.
(edit: I mean,it might not be the same)
Best
Fonzy
I would feel far more confident in tuning at long time controls carrying over to shorter time controls, than the reverse. But even then it is risky. At very fast time controls way big king safety scores can lead you to speculative attacks that the shallow search depths for the opponent can't deal with. At long time controls, they just make you give up material for nothing...

Humans adapt their search strategy between long and short time controls. They evaluate differently, search differently, etc. The programs do not and that is somewhat problematic.
Uri Blass
Posts: 10979
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Debate: testing at fast time controls

Post by Uri Blass »

bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?
This is dependent on the definition of very fast time control.
People suggested 1 minute per game+1 second per move as very fast time control in this thread and the idea is that testing at slower time control is usually a waste of time.

It may be interesting if you can show a case when Crafty is 100 elo better than fruit2.1 at 1+1 and also weaker than fruit2.1 at slower time control when you use the same type of time control(X minutes per game+X seconds per move)

Note that very fast time control of 1 second per game against fruit is not interesting because in this case it is possible that fruit lose most of games on time.

Increasement is important to avoid losses on time.

You can try 6 seconds per game+0.1 second per move but the main problem is that I am not sure that programs use time in a rational way in this case because of the simple reason that testers usually do not test at this time control and programmers may take safety margin that is too big and tell their program to assume it has 0.3 seconds less time.

It is not going to cause big changes at 1+1 time control but it can cause big changes at faster time control because at some point the program may search to depth 1 inspite of having enough time to get depth 4 because the program tries not to take risks.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

Uri Blass wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?
This is dependent on the definition of very fast time control.
People suggested 1 minute per game+1 second per move as very fast time control in this thread and the idea is that testing at slower time control is usually a waste of time.

It may be interesting if you can show a case when Crafty is 100 elo better than fruit2.1 at 1+1 and also weaker than fruit2.1 at slower time control when you use the same type of time control(X minutes per game+X seconds per move)

Note that very fast time control of 1 second per game against fruit is not interesting because in this case it is possible that fruit lose most of games on time.

Increasement is important to avoid losses on time.

You can try 6 seconds per game+0.1 second per move but the main problem is that I am not sure that programs use time in a rational way in this case because of the simple reason that testers usually do not test at this time control and programmers may take safety margin that is too big and tell their program to assume it has 0.3 seconds less time.

It is not going to cause big changes at 1+1 time control but it can cause big changes at faster time control because at some point the program may search to depth 1 inspite of having enough time to get depth 4 because the program tries not to take risks.

Uri
I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.

But the point remains the same... 1+1 might be fast to you. I do a lot of 20sec +0.1sec games (20 seconds on clock, 0.1 second increment) so that I can play a 40K game match in an hour or so. With Crafty, and Glaurung, I can play ridiculously fast games (1 sec on clock, .01 sec increment) and not see any difference in results than when using 1+1 or 5+5. which is why I speculated that the case of Fruit suggests a time allocation issue that faster games highlight.

But today, 1+1 might be the same as .1 + .1 in a couple of years. And the problem shows up again.

I was pointing out that I can easily dig up the data to show this kind of behaviour, which some refuse to believe happens. And it happens over tens of thousands of games, not just over a hundred or so as most are using for testing...

All of my testing is done with increments. I play 20sec+.1sec for quick tests, 1+1 (takes about 12 hours for 32,000+ games) and 5+5 (takes 2 days for 32,000 games, roughly) and I have run 30+30 and 60+60 but they take an extended amount of time and I cut the number of positions significantly so that I am not sitting around for weeks. Some programs just can't cope with fast time controls. I had problems with Arasan in that regard. Fruit doesn't lose those fast games on time, but it loses them badly. Toga2, on the other hand, seems to not be affected nearly as much although there is a difference between fast and slow time control performance.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Debate: testing at fast time controls

Post by Dirt »

bob wrote:I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.
At least on my system, Fruit 2.1 wins every time against Crafty 22.8 if the time control is set to 30 seconds for 40 moves, repeating, with no increment. :(
Uri Blass
Posts: 10979
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Debate: testing at fast time controls

Post by Uri Blass »

bob wrote:
Uri Blass wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?
This is dependent on the definition of very fast time control.
People suggested 1 minute per game+1 second per move as very fast time control in this thread and the idea is that testing at slower time control is usually a waste of time.

It may be interesting if you can show a case when Crafty is 100 elo better than fruit2.1 at 1+1 and also weaker than fruit2.1 at slower time control when you use the same type of time control(X minutes per game+X seconds per move)

Note that very fast time control of 1 second per game against fruit is not interesting because in this case it is possible that fruit lose most of games on time.

Increasement is important to avoid losses on time.

You can try 6 seconds per game+0.1 second per move but the main problem is that I am not sure that programs use time in a rational way in this case because of the simple reason that testers usually do not test at this time control and programmers may take safety margin that is too big and tell their program to assume it has 0.3 seconds less time.

It is not going to cause big changes at 1+1 time control but it can cause big changes at faster time control because at some point the program may search to depth 1 inspite of having enough time to get depth 4 because the program tries not to take risks.

Uri
I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.

But the point remains the same... 1+1 might be fast to you. I do a lot of 20sec +0.1sec games (20 seconds on clock, 0.1 second increment) so that I can play a 40K game match in an hour or so. With Crafty, and Glaurung, I can play ridiculously fast games (1 sec on clock, .01 sec increment) and not see any difference in results than when using 1+1 or 5+5. which is why I speculated that the case of Fruit suggests a time allocation issue that faster games highlight.

But today, 1+1 might be the same as .1 + .1 in a couple of years. And the problem shows up again.

I was pointing out that I can easily dig up the data to show this kind of behaviour, which some refuse to believe happens. And it happens over tens of thousands of games, not just over a hundred or so as most are using for testing...

All of my testing is done with increments. I play 20sec+.1sec for quick tests, 1+1 (takes about 12 hours for 32,000+ games) and 5+5 (takes 2 days for 32,000 games, roughly) and I have run 30+30 and 60+60 but they take an extended amount of time and I cut the number of positions significantly so that I am not sitting around for weeks. Some programs just can't cope with fast time controls. I had problems with Arasan in that regard. Fruit doesn't lose those fast games on time, but it loses them badly. Toga2, on the other hand, seems to not be affected nearly as much although there is a difference between fast and slow time control performance.

If the problem is time management then it means that today 1+1 may not be the same as .1+.1 in a couple of years.

The problem is that a program may simply reduce 0.2 seconds from the remaining time for calculation of time management to avoid the risk of losing on time.

It is not going to cause significant problems at 200 seconds per game+
1 second per move time control but at 20 sec+0.1 second per move the program can be outsearched in the endgame because it is going to assume it has no time in the endgame and play immediately.

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

Dirt wrote:
bob wrote:I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.
At least on my system, Fruit 2.1 wins every time against Crafty 22.8 if the time control is set to 30 seconds for 40 moves, repeating, with no increment. :(
what kind of book? I am playing from 3891 unique starting positions, taken from white to move, move 16, duplicates removed and most popular positions chosen with single-game positions removed. I play every position twice, alternating colors. This with fruit 2.1 running on linux. I am not sure if there are sub-versions or anything, but there are no opening books used anywhere, no endgame tables, no learning, in so far as it can be turned off..

I've posted my results here many times...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Debate: testing at fast time controls

Post by bob »

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Kirill Kryukov wrote:
Kempelen wrote:For testing purposes where the majority of us test our engine in fast controls like 1m-1s or 1m-0s, do you think that if the engine improve in that fast control it will improve in slow/normal games? is there a correlation?

If not, what do you (assuming you test in fast games) do to take care of normal games?
I think that general assumption is that results of fast games correlate with the results of slow games very well. This is what we expect. I am yet to see any convincing data showing that it is not the case. I mean any example of engine that performs significantly different depending on time control. I think such example is necessary before this discussion can reach anywhere.

Best,
Kirill
Suppose I post some test results between Crafty and Fruit, at say 8000 game matches from the same starting positions. And I show you that at _very_ fast time controls, Crafty is +300 Elo better. At fast (but slower than previous run) Crafty is +100 better. And at long time controls, Crafty is about +40 better. Would that be convincing?
This is dependent on the definition of very fast time control.
People suggested 1 minute per game+1 second per move as very fast time control in this thread and the idea is that testing at slower time control is usually a waste of time.

It may be interesting if you can show a case when Crafty is 100 elo better than fruit2.1 at 1+1 and also weaker than fruit2.1 at slower time control when you use the same type of time control(X minutes per game+X seconds per move)

Note that very fast time control of 1 second per game against fruit is not interesting because in this case it is possible that fruit lose most of games on time.

Increasement is important to avoid losses on time.

You can try 6 seconds per game+0.1 second per move but the main problem is that I am not sure that programs use time in a rational way in this case because of the simple reason that testers usually do not test at this time control and programmers may take safety margin that is too big and tell their program to assume it has 0.3 seconds less time.

It is not going to cause big changes at 1+1 time control but it can cause big changes at faster time control because at some point the program may search to depth 1 inspite of having enough time to get depth 4 because the program tries not to take risks.

Uri
I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.

But the point remains the same... 1+1 might be fast to you. I do a lot of 20sec +0.1sec games (20 seconds on clock, 0.1 second increment) so that I can play a 40K game match in an hour or so. With Crafty, and Glaurung, I can play ridiculously fast games (1 sec on clock, .01 sec increment) and not see any difference in results than when using 1+1 or 5+5. which is why I speculated that the case of Fruit suggests a time allocation issue that faster games highlight.

But today, 1+1 might be the same as .1 + .1 in a couple of years. And the problem shows up again.

I was pointing out that I can easily dig up the data to show this kind of behaviour, which some refuse to believe happens. And it happens over tens of thousands of games, not just over a hundred or so as most are using for testing...

All of my testing is done with increments. I play 20sec+.1sec for quick tests, 1+1 (takes about 12 hours for 32,000+ games) and 5+5 (takes 2 days for 32,000 games, roughly) and I have run 30+30 and 60+60 but they take an extended amount of time and I cut the number of positions significantly so that I am not sitting around for weeks. Some programs just can't cope with fast time controls. I had problems with Arasan in that regard. Fruit doesn't lose those fast games on time, but it loses them badly. Toga2, on the other hand, seems to not be affected nearly as much although there is a difference between fast and slow time control performance.

If the problem is time management then it means that today 1+1 may not be the same as .1+.1 in a couple of years.

The problem is that a program may simply reduce 0.2 seconds from the remaining time for calculation of time management to avoid the risk of losing on time.

It is not going to cause significant problems at 200 seconds per game+
1 second per move time control but at 20 sec+0.1 second per move the program can be outsearched in the endgame because it is going to assume it has no time in the endgame and play immediately.

Uri
I am not sure whether it is time management or what. Arasan 10 was losing most games on time at very fast time controls. With Fruit, out of 8,000 games I may see 3-4 that are lost on time, but the rest are lost by a big margin. That is the reason I have settled on the 20sec+.1sec control, fruit does about as well there as it does at 1+1 or 5+5 in my testing. But I have some 15 minute test runs I can do, and at that kind of time control to complete 32,000 games in 15 minutes, using 256 processors, things go much worse for Fruit, whether it is what you are describing or not I have not tried to determine...

However, any bug that a program could have at 20sec+0.1sec, another program could have a similar bug at 1min+1sec, just as easily...
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Debate: testing at fast time controls

Post by Dirt »

bob wrote:
Dirt wrote:
bob wrote:I have not found a time control where Crafty loses to fruit since the recent versions came out, at least on the 3981 starting positions I use for my testing.
At least on my system, Fruit 2.1 wins every time against Crafty 22.8 if the time control is set to 30 seconds for 40 moves, repeating, with no increment. :(
what kind of book? I am playing from 3891 unique starting positions, taken from white to move, move 16, duplicates removed and most popular positions chosen with single-game positions removed. I play every position twice, alternating colors. This with fruit 2.1 running on linux. I am not sure if there are sub-versions or anything, but there are no opening books used anywhere, no endgame tables, no learning, in so far as it can be turned off..

I've posted my results here many times...
I'm using performance.bin for Fruit, and I think Crafty is using Peter's book, but it doesn't matter. Crafty never uses more than one second for the entire game. If I increase the time to one minute Crafty plays normally. If, as I expect, you never play without increment you won't see this problem.