CCRL 40/4 lists updated (11th August 2012)

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CCRL 40/4 lists updated (11th August 2012)

Post by geots »

Uri Blass wrote:
geots wrote:
Adam Hair wrote:
lkaufman wrote:
Adam Hair wrote:Here are all of the relevant details that I can think of:

OS: Windows XP 64-bit
CPU: Intel QX6700 at 3.05 GHz
Time Control: 40/3'
GUI: cutechess-cli
Hash: 128 MB
EGTB: None
Starting Positions: PGN of ~17,900 positions 4 moves deep
Resign: off
Draws: game adjudicated as a draw if both engines' score is within 50 centipawns after 250 moves. I do not remember if cutechess uses the 50 moves rule (I think it does).



lkaufman wrote: Two comments:


1. I believe your cpu is pre-sse4. Since Komodo really suffers on non-sse4 machines (compared to other engines), that probably accounts for the bulk of the 20 elo. Do your other testers have sse4 machines or not?
Yes, two 40/4 testers have SSE4 CPUs. And our results for Komodo 4 showed no measurable difference between non-SSE4 and SSE4. Though, if we played 20,000 games, it is possible that a statistically significant difference would be found.
lkaufman wrote: 2. We learned that it is very important for testers to use the 50 move rule. If they do not, engines may make ridiculous moves when they think the 50 move rule is about to apply. You should verify that it does use the 50 move rule and switch if it does not.

Thanks for your answers and your testing!
I have confirmed that cutechess does use the 50 move limit. I was 99% certain before; now I am 100% certain since at least 1 game was adjudicated as a draw because of the 50 move limit.

With Ilari's post, I am 110% certain :)



Anyone who thinks SSE would add 20 elo better take a long look in the mirror. It would be next to impossible for it to ever add a double-digit elo gain. I'm thinking 3 or 4 elo tops, maybe an extreme case where it had a 6 elo gain- but 10 to 20. Either pure bullshit, or someone is chasing rainbows- you pick.


george


PS: One other thing everyone should keep in mind. If you are beta testing an engine for a future release, at least 50% of your testing should be at time controls that the most prominent testing groups use. Beta testing with no "repeating" time controls, then seeing it rated with nothing BUT repeating controls will make more of an elo difference than SSE could ever think of making. (This whole set of threads is a long journey to nowhere!)
I do not understand your confidence that 10 elo difference is impossible.

CCRL did not play enough games for Komodo to have a statistical error that is lower than 10 elo so the fact that they see no difference between SSE and not SSE proves nothing.


It is simple Uri. You don't test engines and see the results "with this time control and with that time control" and "with sse and without" 365 days a year like I do. I'm speaking from experience and you are guessing.


george
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL 40/4 lists updated (11th August 2012)

Post by lkaufman »

George, I'll ask you, do you have any reason to think that Komodo performs better or worse (against Houdini, Critter, Ivanhoe, and Stockfish) at repeating time controls as compared to increment time controls? We don't test at repeating time controls (except for very fast tests on rare occasions when working on time control) because they are a big waste of time, increment play is clearly superior for testing. But you are right, it is possible that Komodo is weaker at repeating controls. I'm asking if you have reason to believe this is actually the case.

I'll just point out that although our CCRL and CEGT blitz ratings are lower than what we get with our increment testing, the CCRL and CEGT ratings at intermediate levels (40/40 and 40/20) for Komodo seem about right relative to Houdini, and they are also repeating controls.
User avatar
Graham Banks
Posts: 41464
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Graham Banks »

lkaufman wrote:......increment play is clearly superior for testing.
I think that the right word here is convenient, not superior.
gbanksnz at gmail.com
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CCRL 40/4 lists updated (11th August 2012)

Post by geots »

lkaufman wrote:George, I'll ask you, do you have any reason to think that Komodo performs better or worse (against Houdini, Critter, Ivanhoe, and Stockfish) at repeating time controls as compared to increment time controls? We don't test at repeating time controls (except for very fast tests on rare occasions when working on time control) because they are a big waste of time, increment play is clearly superior for testing. But you are right, it is possible that Komodo is weaker at repeating controls. I'm asking if you have reason to believe this is actually the case.

I'll just point out that although our CCRL and CEGT blitz ratings are lower than what we get with our increment testing, the CCRL and CEGT ratings at intermediate levels (40/40 and 40/20) for Komodo seem about right relative to Houdini, and they are also repeating controls.

No, I really don't have any basis in fact to know that Komodo performs better or worse against said engines at incremental controls or repeating controls. Telling Uri that the controls could make more difference than sse vs no-sse I still will stand by. They very well "could", because the sse vs no-sse difference is so minimal anyway. I believe Jean Paul will agree with me on that.

But I am at a disadvantage here- because I don't follow what the point is. I'll go back to what Joe Garagiola said about Ted Williams. "He is a pure hitter, and he doesn't care about the conditions. He could hit the ball at midnight in a wind tunnel." That's the same with the Number 1 engine in the world. If you were playing Houdini for the championship, I doubt Robert would care if it was long control, short control, repeating or incremental. The programmer of the number 1 engine never cares. All he wants is to play.

So personally, I don't think the control used is going to have a thing to do with who is Number 1 and who is Number 2. I am probably the only person who has combined as much repeating controls with incremental controls in testing Komodo ag. Houdini. Probably 50-50. And I have not seen it make a difference one way or the other. And I was a bit surprised to see Houdini do a little better at 40/40 than at 40/4. My guess would be 5 to 8 elo. I say "guess", because I don't have the games to back that up yet.

I switched once from 4m+2s to 40/3 repeating, because some engines were having a lot of time losses. And generally speaking, I never went back, when just testing for myself. As for beta testing- I just follow orders- which is the way it should be.

But it doesn't matter if it is Vas, Robert, Richard, you and Don, whoever- if it makes a difference to the author what type of controls are used- he needs to go back to the drawing board. The person with a true Number 1 engine doesn't care.



Best,

george




Larry,
Modern Times
Posts: 3550
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Modern Times »

Uri Blass wrote:
I do not understand your confidence that 10 elo difference is impossible.

CCRL did not play enough games for Komodo to have a statistical error that is lower than 10 elo so the fact that they see no difference between SSE and not SSE proves nothing.
True, but equally Larry's assertion that there *is* a 10 Elo difference is also impossible to prove, and until he proves it I don't believe it. His conclusions from the CEGT results to back up his 1.3 Elo per percentage point improvement are flawed because of the error margins on that list (and ours)
Modern Times
Posts: 3550
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Modern Times »

There are other fatal flaws in this line of reasoning:

- doubling the speed of a 2800 engine say, will yield different gains from doubling the speed of a 2950 engine (law of diminishing returns)

- a 90 Elo increase from doubling the speed may not be a linear progression. It could well be that a 7% speed increase achieves zero Elo, and any improvement comes later.
Uri Blass
Posts: 10309
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Uri Blass »

Modern Times wrote:There are other fatal flaws in this line of reasoning:

- doubling the speed of a 2800 engine say, will yield different gains from doubling the speed of a 2950 engine (law of diminishing returns)

- a 90 Elo increase from doubling the speed may not be a linear progression. It could well be that a 7% speed increase achieves zero Elo, and any improvement comes later.
1)I do not think that there is a lot of difference between the change from doubling.

diminishing returns is about the same engine.

When we talk about different engines my opinion is that in most cases the stronger engine earns more elo from doubling if you start from the same playing strength.

2)90 elo increase from doubling the speed does not have to be linear but usually it is very close to be linear.

There is diminisihng returns but it has only little influence.

If an engine earns 90 elo from doubling then it may earn only 80 elo from another doubling but not 50 elo(I assume no serious bug) so it may be 1.2 elo for 1% improvement instead of 1.3 elo for 1% improvement(assuming the 90 elo is correct).
Modern Times
Posts: 3550
Joined: Thu Jun 07, 2012 11:02 pm

Re: CCRL 40/4 lists updated (11th August 2012)

Post by Modern Times »

Uri Blass wrote: 2)90 elo increase from doubling the speed does not have to be linear but usually it is very close to be linear.
I don't agree with that. I'd say it is anything but linear. The extra speed is most beneficial if it causes the engine to go deeper. If the engine is on the threshold of taking that next step, then a 7% speed increase could be very beneficial. If it is not, then it may make no difference at all.

The only way to know is to test SSE and Non-SSE on the same machine. But you can't do that, because Komodo will use SSE if the machine is capable. Since I have seen no evidence of what a 7% speed-up does for Komodo (whether from SSE or not) than my assumption is zero Elo gain until shown otherwise.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL 40/4 lists updated (11th August 2012)

Post by lkaufman »

Graham Banks wrote:
lkaufman wrote:......increment play is clearly superior for testing.
I think that the right word here is convenient, not superior.
Maybe the best wording of all is "more efficient". You get more games of the same average quality per hour with increment testing.
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: CCRL 40/4 lists updated (11th August 2012)

Post by lkaufman »

geots wrote:
lkaufman wrote:George, I'll ask you, do you have any reason to think that Komodo performs better or worse (against Houdini, Critter, Ivanhoe, and Stockfish) at repeating time controls as compared to increment time controls? We don't test at repeating time controls (except for very fast tests on rare occasions when working on time control) because they are a big waste of time, increment play is clearly superior for testing. But you are right, it is possible that Komodo is weaker at repeating controls. I'm asking if you have reason to believe this is actually the case.

I'll just point out that although our CCRL and CEGT blitz ratings are lower than what we get with our increment testing, the CCRL and CEGT ratings at intermediate levels (40/40 and 40/20) for Komodo seem about right relative to Houdini, and they are also repeating controls.

No, I really don't have any basis in fact to know that Komodo performs better or worse against said engines at incremental controls or repeating controls. Telling Uri that the controls could make more difference than sse vs no-sse I still will stand by. They very well "could", because the sse vs no-sse difference is so minimal anyway. I believe Jean Paul will agree with me on that.

But I am at a disadvantage here- because I don't follow what the point is. I'll go back to what Joe Garagiola said about Ted Williams. "He is a pure hitter, and he doesn't care about the conditions. He could hit the ball at midnight in a wind tunnel." That's the same with the Number 1 engine in the world. If you were playing Houdini for the championship, I doubt Robert would care if it was long control, short control, repeating or incremental. The programmer of the number 1 engine never cares. All he wants is to play.

So personally, I don't think the control used is going to have a thing to do with who is Number 1 and who is Number 2. I am probably the only person who has combined as much repeating controls with incremental controls in testing Komodo ag. Houdini. Probably 50-50. And I have not seen it make a difference one way or the other. And I was a bit surprised to see Houdini do a little better at 40/40 than at 40/4. My guess would be 5 to 8 elo. I say "guess", because I don't have the games to back that up yet.

I switched once from 4m+2s to 40/3 repeating, because some engines were having a lot of time losses. And generally speaking, I never went back, when just testing for myself. As for beta testing- I just follow orders- which is the way it should be.

But it doesn't matter if it is Vas, Robert, Richard, you and Don, whoever- if it makes a difference to the author what type of controls are used- he needs to go back to the drawing board. The person with a true Number 1 engine doesn't care.


Best,

george


Larry,

Of course it can make a large elo difference which type of time control you use, because one engine may have a seriously bad time algorithm for one or the other. This also implies that one might have the best engine but it might test as worse at some type of time control.
However as things actually stand now I don't think any of the top few engines has a seriously bad time algorithm, and so at the present time you are right to say the best engine should win at any type of time control. But I think this does not mean at any level; very fast time controls like 1' bullet chess may not be representative of normal chess as things stand now. In a couple more years that may no longer be true, but right now bullet chess favors Ippo and all derivatives and relatives of it. CCRL and CEGT blitz levels (40/3' on various hardware, mostly a bit old) are still too close to bullet chess to be good predictors of results at their intermediate levels, I think.