CEGT - rating lists December 27th 2015

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Werner
Posts: 3018
Joined: Wed Mar 08, 2006 10:09 pm
Location: Germany
Full name: Werner Schüle

CEGT - rating lists December 27th 2015

Post by Werner »

Hi all, :D

our actual rating lists are online and can be found under the attached links.

40 / 20:
New games: 812; 10 different engines
Total: 890.508

NEW Engines
-

UPDATES
Gull 3.0 x64 4CPU: 3193 - 4492 games (-1)
29 Fritz 15 x64 4CPU: 3116 - 1339 games (+-0)
2 Komodo 9.3 x64 4CPU: 3322 - 1174 games (-6)

40 / 120:
There was an update on December 16th. Now 21.210 games and 80 engines. We are testing Komodo 9.3 x64.

5'+3'' pb=on
We tested Fritz 15 x64 (3020/ 2600 games) against remaining engines of the full list.

40 / 4
No update.
We are testing AL Chess 1.84 and Andscacs 0.84 x64 1CPU (+36) and Laser 1.0.

A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
40/20 pb=on: http://www.husvankempen.de/nunn/rating4020PBON.htm
5+3 pb=on: http://www.husvankempen.de/nunn/rating5plus3pbon.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

We wish you all the best for 2016.
Werner Schüle
CEGT-Team
Modern Times
Posts: 3807
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists December 27th 2015

Post by Modern Times »

So on 40/20, you have Komodo 9.3 x64 4CPU 19 Elo weaker than Komodo 9.2 x64 4CPU ?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: CEGT - rating lists December 27th 2015

Post by bob »

Modern Times wrote:So on 40/20, you have Komodo 9.3 x64 4CPU 19 Elo weaker than Komodo 9.2 x64 4CPU ?
Look at the error bars...

Can't conclude anything about a 19 Elo difference with a +/- 16 Elo error bar...
Modern Times
Posts: 3807
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT - rating lists December 27th 2015

Post by Modern Times »

bob wrote:
Modern Times wrote:So on 40/20, you have Komodo 9.3 x64 4CPU 19 Elo weaker than Komodo 9.2 x64 4CPU ?
Look at the error bars...

Can't conclude anything about a 19 Elo difference with a +/- 16 Elo error bar...
Very true. That was the trouble with testing 9.3 - with a claimed +15 Elo, the testing groups would never be able to verify it with the number of games they usually play.
Frank Quisinsky
Posts: 7237
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: The moon is a good place for errorbar ...

Post by Frank Quisinsky »

Hi Ray,

to 75% right I think!

With more opponents and the same quantity of games you are able to kick the error bar at the moon.

No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.

Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents

But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.

Can't be right!

Best
Frank
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The moon is a good place for errorbar ...

Post by bob »

Frank Quisinsky wrote:Hi Ray,

to 75% right I think!

With more opponents and the same quantity of games you are able to kick the error bar at the moon.

No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.

Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents

But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.

Can't be right!

Best
Frank
Unfortunately, statistics don't buy into that. # of games is all that matters, based purely on sampling theory. There is no way to shortchange the number of games without a corresponding loss of accuracy / increase in the error margin.
User avatar
Graham Banks
Posts: 45337
Joined: Sun Feb 26, 2006 10:52 am
Location: Auckland, NZ

Re: The moon is a good place for errorbar ...

Post by Graham Banks »

bob wrote:
Frank Quisinsky wrote:Hi Ray,

to 75% right I think!

With more opponents and the same quantity of games you are able to kick the error bar at the moon.

No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.

Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents

But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.

Can't be right!

Best
Frank
Unfortunately, statistics don't buy into that. # of games is all that matters, based purely on sampling theory. There is no way to shortchange the number of games without a corresponding loss of accuracy / increase in the error margin.
If all of the rating lists show similar results for an engine, that is probably the best overall guide.
gbanksnz at gmail.com
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: The moon is a good place for errorbar ...

Post by bob »

Graham Banks wrote:
bob wrote:
Frank Quisinsky wrote:Hi Ray,

to 75% right I think!

With more opponents and the same quantity of games you are able to kick the error bar at the moon.

No calculation program we have considered quantity of opponents in the calculation and this is completly wrong.

Means ...
2.000 games vs. 40 opponents is earlier exactly as
2.000 games vs. 10 opponents

But error bar results is the same for each of the calculation programs, not important 1, 10, 50, or 100 opponents.

Can't be right!

Best
Frank
Unfortunately, statistics don't buy into that. # of games is all that matters, based purely on sampling theory. There is no way to shortchange the number of games without a corresponding loss of accuracy / increase in the error margin.
If all of the rating lists show similar results for an engine, that is probably the best overall guide.
Not sure what you mean. He mentioned fewer games against more opponents gave a more accurate rating. That's not how Elo and sampling theory work. To get a specific error bar, you have to play the right number of games. There is no way to replace N thousand games with N hundreds of games and get the same accuracy.

Everybody wants to cheat the statistical gods that control the error bar. Won't ever happen, however. In his example, 2000 games gives a specific error bar, number of opponents doesn't any effect on that. IE 10 games vs one opponent or 1 games vs 10 opponents, you get the same error bar.