CEGT - rating lists August 12th 2012

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists August 12th 2012

Post by lkaufman »

carldaman wrote:Larry, I'd also be curious as to the GUI used for testing. As I noted in another thread, Komodo5 using the default drawscore of -7 suffers when tested on the FRitz GUI. I had to set the drawscore to 0 to ensure that Komodo did not make obviously weak moves.

Regards,
Carl
Do you happen to know whether the odd behavior applies to Fritz 11 gui as well as Fritz 12 gui? Also, is it repeatable, can you get the suspicious moves to be played every time from the given position? It seems to me that using the negative drawscore should make the engine play more actively to avoid a repetition, unless somehow the interface reversed the sign, which seems unlikely. So it's quite puzzling.
carldaman
Posts: 2287
Joined: Sat Jun 02, 2012 2:13 am

Re: CEGT - rating lists August 12th 2012

Post by carldaman »

Do you happen to know whether the odd behavior applies to Fritz 11 gui as well as Fritz 12 gui? Also, is it repeatable, can you get the suspicious moves to be played every time from the given position?


Repeatable? Most certainly. I'm positive about that, as I've run multiple tests to rule out any freak occurrence.

I can't comment on Fritz 11, as I currently only have Fritz 12 installed. It would be nice if someone could try to replicate this behavior using Fritz 11-12-13, to see if there is any difference.

I've also observed that Komodo4 does not exhibit the strange behavior when tested with Fritz 12 GUI. Its default drawscore is -5.

What's also interesting is that K5's default drawscore is not an issue for the other GUI's I normally test with (Arena, Winboard, ChessGUI). The bad/strange moves only manifested themselves within the Fritz 12 GUI environment. Setting the drawscore to zero did get rid of the problem, however, which was a relief. Such a step was not necessary for the other GUIs mentioned.

CL
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists August 12th 2012

Post by lkaufman »

carldaman wrote:Do you happen to know whether the odd behavior applies to Fritz 11 gui as well as Fritz 12 gui? Also, is it repeatable, can you get the suspicious moves to be played every time from the given position?


Repeatable? Most certainly. I'm positive about that, as I've run multiple tests to rule out any freak occurrence.

I can't comment on Fritz 11, as I currently only have Fritz 12 installed. It would be nice if someone could try to replicate this behavior using Fritz 11-12-13, to see if there is any difference.

I've also observed that Komodo4 does not exhibit the strange behavior when tested with Fritz 12 GUI. Its default drawscore is -5.

What's also interesting is that K5's default drawscore is not an issue for the other GUI's I normally test with (Arena, Winboard, ChessGUI). The bad/strange moves only manifested themselves within the Fritz 12 GUI environment. Setting the drawscore to zero did get rid of the problem, however, which was a relief. Such a step was not necessary for the other GUIs mentioned.

CL
I'm running an overnight match using the Fritz 11 gui to see if the results are way out of line with what we get on our own tester.
User avatar
Werner
Posts: 2993
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

Re: CEGT - rating lists August 12th 2012

Post by Werner »

lkaufman wrote:Question: What is the actual hardware and actual time limit used for most of the 40/4 games now? What time limit on some brand new Intel computer, say at 3 GHz, would be closest to what you actually use on average? We're trying to find out why our results for Komodo are consistently better at blitz than those reported by both CCRL and CEGT, even with our opening book modified to me more typical of others. Also, is it possible to see whether the ratings of the top few single-core engines would be much different if only pairings among them were rated?

We appreciate all your hard work.

Thanks,
Larry for Komodo
Hi Larry,
sorry - Wolfgang is in its holidays, perhaps Gerhard too. I think Wolfgang posted his hardware a few months ago here.
Intel i5-2400 @3.10GHz / 4GB RAM
Intel Q-6600 @2.60GHz / 4GB RAM
Intel Q-8200 @2.33GHz / 4GB RAM
AMD X-4 @3.00GHz / 6GB RAM
maybe these are his pcs but I am not sure - and he runs games 40/3 not faster times. And I am sure he uses CB GUI only for playing games with Fritz engine. Both use Shredder GUI and Arena.
Werner
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists August 12th 2012

Post by lkaufman »

Thanks. My overnight results did not confirm any problem with Fritz 11 gui (didn't try 12), and anyway you don't usually use it. There is about a 20 elo gap between CEGT/CCRL blitz results and our own (in terms of relative rating of Houdini and Komodo), and I'm running out of theories to explain it. We use increment rather than repeating time controls, so this could be a contributing factor, but we haven't found that our results are noticeably worse when we do try repeating time controls. We don't use TBs in our tests, but it is widely reported that they don't help elo. Our current test level is probably roughly equivalent to yours. Our books were modified to include longer lines and should be more like yours. I wonder what could account for the 20 elo? There are too many games to attribute it to sample error.
MM
Posts: 766
Joined: Sun Oct 16, 2011 11:25 am

Re: CEGT - rating lists August 12th 2012

Post by MM »

lkaufman wrote:Thanks. My overnight results did not confirm any problem with Fritz 11 gui (didn't try 12), and anyway you don't usually use it. There is about a 20 elo gap between CEGT/CCRL blitz results and our own (in terms of relative rating of Houdini and Komodo), and I'm running out of theories to explain it. We use increment rather than repeating time controls, so this could be a contributing factor, but we haven't found that our results are noticeably worse when we do try repeating time controls. We don't use TBs in our tests, but it is widely reported that they don't help elo. Our current test level is probably roughly equivalent to yours. Our books were modified to include longer lines and should be more like yours. I wonder what could account for the 20 elo? There are too many games to attribute it to sample error.
Hi Larry, you said (about the book) ''more like yours''.
I think that it could be that ''more like yours'' and not ''identical'' could mean several elo.

As you know some engines are sensible to some kinds of positions.

I can prepare a book with the same number of plies (on average) of ccrl and cegt but it doesn't guarantee anything.

IMO what really matters is the kind of position that runs after the end of the book. I'm sure you understand what i mean.

Best Regards
MM
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists August 12th 2012

Post by lkaufman »

MM wrote:
lkaufman wrote:Thanks. My overnight results did not confirm any problem with Fritz 11 gui (didn't try 12), and anyway you don't usually use it. There is about a 20 elo gap between CEGT/CCRL blitz results and our own (in terms of relative rating of Houdini and Komodo), and I'm running out of theories to explain it. We use increment rather than repeating time controls, so this could be a contributing factor, but we haven't found that our results are noticeably worse when we do try repeating time controls. We don't use TBs in our tests, but it is widely reported that they don't help elo. Our current test level is probably roughly equivalent to yours. Our books were modified to include longer lines and should be more like yours. I wonder what could account for the 20 elo? There are too many games to attribute it to sample error.
Hi Larry, you said (about the book) ''more like yours''.
I think that it could be that ''more like yours'' and not ''identical'' could mean several elo.

As you know some engines are sensible to some kinds of positions.

I can prepare a book with the same number of plies (on average) of ccrl and cegt but it doesn't guarantee anything.

IMO what really matters is the kind of position that runs after the end of the book. I'm sure you understand what i mean.

Best Regards
Yes, but I assumed (perhaps wrongly) that most books used would choose a cross section of openings typically seen in human play.
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists August 12th 2012

Post by lkaufman »

Werner wrote:
lkaufman wrote:Question: What is the actual hardware and actual time limit used for most of the 40/4 games now? What time limit on some brand new Intel computer, say at 3 GHz, would be closest to what you actually use on average? We're trying to find out why our results for Komodo are consistently better at blitz than those reported by both CCRL and CEGT, even with our opening book modified to me more typical of others. Also, is it possible to see whether the ratings of the top few single-core engines would be much different if only pairings among them were rated?

We appreciate all your hard work.

Thanks,
Larry for Komodo
Hi Larry,
sorry - Wolfgang is in its holidays, perhaps Gerhard too. I think Wolfgang posted his hardware a few months ago here.
Intel i5-2400 @3.10GHz / 4GB RAM
Intel Q-6600 @2.60GHz / 4GB RAM
Intel Q-8200 @2.33GHz / 4GB RAM
AMD X-4 @3.00GHz / 6GB RAM
maybe these are his pcs but I am not sure - and he runs games 40/3 not faster times. And I am sure he uses CB GUI only for playing games with Fritz engine. Both use Shredder GUI and Arena.
At least the Q-6600 does not use SSE4 I believe, I'm not sure about the others. Can you estimate what percentage of the games use SSE4? This is starting to look like the main culprit, in CCRL as well.
Also, are the longer time control (non-blitz) games played on the same machines, or do they perhaps all use SSE4?
ThatsIt
Posts: 992
Joined: Thu Mar 09, 2006 2:11 pm

Re: CEGT - rating lists August 12th 2012

Post by ThatsIt »

Hi Larry !
lkaufman wrote: [..snip...]
Can you estimate what percentage of the games use SSE4?
[...snip...]
All the Komodo 5 x64 games for the CEGT 40/4 were played by using SSE4.

Best wishes,
G.S.
(CEGT member)
Modern Times
Posts: 3748
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists August 12th 2012

Post by Modern Times »

All the Komodo 5 x64 games for the CEGT 40/4 were played by using SSE4.

Best wishes,
G.S.
(CEGT member)
Well now, that blows Larry's theory out of the water.