CEGT - rating lists August 12th 2012

lkaufman · Post by **lkaufman** » Wed Aug 15, 2012 3:53 pm

ThatsIt wrote:Hi Larry !

lkaufman wrote: [..snip...]
Can you estimate what percentage of the games use SSE4?
[...snip...]
All the Komodo 5 x64 games for the CEGT 40/4 were played by using SSE4.

Best wishes,
G.S.
(CEGT member)

Maybe I misunderstood, but I thought a previous post mentioned using the Q6600, which I believe is too old to have sse4. What is my mistake?

ThatsIt · Post by **ThatsIt** » Thu Aug 16, 2012 8:23 am

lkaufman wrote:
ThatsIt wrote:Hi Larry !

lkaufman wrote: [..snip...]
Can you estimate what percentage of the games use SSE4?
[...snip...]
All the Komodo 5 x64 games for the CEGT 40/4 were played by using SSE4.

Best wishes,
G.S.
(CEGT member)
Maybe I misunderstood, but I thought a previous post mentioned using the Q6600, which I believe is too old to have sse4. What is my mistake?

If an engine is ready for SSE4, we try to use only the
SSE4 hardware for the tests. Wolfgang used his AMD X-4's
and i the Intel i5-2400 for the Komodo 5 x64 tests.
Best wishes,
G.S.

lkaufman · Post by **lkaufman** » Thu Aug 16, 2012 4:49 pm

ThatsIt wrote:
lkaufman wrote:
ThatsIt wrote:Hi Larry !

lkaufman wrote: [..snip...]
Can you estimate what percentage of the games use SSE4?
[...snip...]
All the Komodo 5 x64 games for the CEGT 40/4 were played by using SSE4.

Best wishes,
G.S.
(CEGT member)
Maybe I misunderstood, but I thought a previous post mentioned using the Q6600, which I believe is too old to have sse4. What is my mistake?
If an engine is ready for SSE4, we try to use only the
SSE4 hardware for the tests. Wolfgang used his AMD X-4's
and i the Intel i5-2400 for the Komodo 5 x64 tests.
Best wishes,
G.S.

I see, thanks. I want to mention that it is not only Komodo where our data do not agree very well with either CCRL or CEGT. We get lower ratings for Stockfish and higher ratings for Critter (relative to Houdini) than CCRL and CEGT, and we have 10,000 game samples for most engines at three different levels. Most likely it is due to the fact that we do increment testing rather than repeating time controls, but I don't think this is the whole story, as we also observe the same discrepancy with IPON, which does use increment testing. I guess it will take some time to learn what is the cause of these discrepancies, which are a bit too large to blame on sample error.

Dan Honeycutt · Post by **Dan Honeycutt** » Thu Aug 16, 2012 5:15 pm

lkaufman wrote:I guess it will take some time to learn what is the cause of these discrepancies, which are a bit too large to blame on sample error.

Do your book or starting positions differ?

Best
Dan H.

lkaufman · Post by **lkaufman** » Thu Aug 16, 2012 8:12 pm

Dan Honeycutt wrote:
lkaufman wrote:I guess it will take some time to learn what is the cause of these discrepancies, which are a bit too large to blame on sample error.
Do your book or starting positions differ?

Best
Dan H.

Well it would have to, since the different testing organizations and even different testers within them use different books. Our book now is of highly variable depth and includes positions frequently seen in human tournament play. I don't know whether that is a fair description of the most popular books in use by the testing groups. This could account for a few elo discrepancy, I think.

ThatsIt · Post by **ThatsIt** » Fri Aug 17, 2012 8:49 am

lkaufman wrote: Well it would have to, since the different testing organizations and even different testers within them use different books. Our book now is of highly variable depth and includes positions frequently seen in human tournament play. I don't know whether that is a fair description of the most popular books in use by the testing groups. This could account for a few elo discrepancy, I think.

Perhaps you are testing against too little different opponents?
Best wishes,
G.S.

lkaufman · Post by **lkaufman** » Fri Aug 17, 2012 4:00 pm

ThatsIt wrote:
lkaufman wrote: Well it would have to, since the different testing organizations and even different testers within them use different books. Our book now is of highly variable depth and includes positions frequently seen in human tournament play. I don't know whether that is a fair description of the most popular books in use by the testing groups. This could account for a few elo discrepancy, I think.
Perhaps you are testing against too little different opponents?
Best wishes,
G.S.

That's true, we only use Houdini 1.5, Critter, and Stockfish on the distributed test as we are limited to free engines for this and don't think it's worthwhile to test against engines that are more than 150 elo below us. But would your results be much different if you limit opponents to STockfish level and above? Of course then your sample size would be too small.

CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012

Re: CEGT - rating lists August 12th 2012