CEGT - rating lists February 12th 2012

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists February 12th 2012

Post by lkaufman »

IWB wrote:
lkaufman wrote:Regarding the renormalizing of the list to DS 12 (1 core) = 2800, I think it is a good change, although I think it is also clear that DS 12 on 1 core would crush Carlsen, Aronian, Kasparov, or Anand in a match. The reason I think it is a good change despite this is that engine vs engine ratings show larger differences than would be obtained if they only played top human players, so the result is that while 2800 engines will be underrated now in human terms, the ones at the very top will have ratings that are fairly close to what they would get in human tournaments, in my opinion.
I agree that S12 1core is rated to low compared to humans. Everything else is guessing as we do not have any reliable number ...

Anyhow, if the CEGT will change to Bayes as well AND they want to raise the raiting to something else I am open for suggestions and willing to go with them to a reasonable point ...

Bye
Ingo

PS: Hello CEGT: Put together S12 x64 and S12 w32. The Engines ARE identical and you have a much better base then!
I have overwhelming evidence based on a study of the Swedish computer rating list over a 20 year span that these lists "expand" rating differences from human competition by about a 4/3 ratio. Therefore it is certain that if DS 12 were rated "correctly" (let's say 3000 or so), the top engines would be rated too high in human terms. So I think you are both quite reasonable to choose 2800 for DS12 to make the top ratings reasonable. Maybe I would have chosen 2850, but for sure it should not be higher than 2900, even though it would actually rate near 3000 against humans I believe.
Regarding s12 x64 and x32, if they are really identical this would seem to be fraud to sell x64 as somehow better than x32. My impression is that x64 is supposed to be at least a tiny bit better for some reason, though I'm sure you are right about combining them for rating purposes. Does anyone know what (if any) is the difference between s12 x64 and x32? There must be something as I don't believe the author would intentionally deceive his customers, his reputation is too good for this.
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT - rating lists February 12th 2012

Post by IWB »

lkaufman wrote:...Does anyone know what (if any) is the difference between s12 x64 and x32? There must be something as I don't believe the author would intentionally deceive his customers, his reputation is too good for this.
What "deceive" is there? You buy the Deep version and you get the 64 bit version, which can alocate more hash, for free. Why does it have to be better or where is it advertised as such?

I dont get what you try to imply here.

Bye
Ingo
lkaufman
Posts: 6258
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT - rating lists February 12th 2012

Post by lkaufman »

You answered my question. The 64 bit version allows you to use more hash, so for serious analysis it would be stronger, without being any stronger at the time limits most testers use. I knew there had to be some difference, I just didn't know what it was. Thanks.
ThatsIt
Posts: 992
Joined: Thu Mar 09, 2006 2:11 pm

Re: CEGT - rating lists February 12th 2012

Post by ThatsIt »

Hi Ingo !
IWB wrote: Ahhh, I think about changing this and you changed first - very good.
Now you change to bayes (I provide a batch file if needed), clean up the 40/20 a bit (remove engines with a low number of games) and things would be perfect.
How could things be perfect ?
We're testing with ponder=off.

;-)

Best wishes,
G.S.
lucasart
Posts: 3241
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: CEGT - rating lists February 12th 2012

Post by lucasart »

ThatsIt wrote:Hi Ingo !
IWB wrote: Ahhh, I think about changing this and you changed first - very good.
Now you change to bayes (I provide a batch file if needed), clean up the 40/20 a bit (remove engines with a low number of games) and things would be perfect.
How could things be perfect ?
We're testing with ponder=off.

;-)

Best wishes,
G.S.
That makes perfect sense! Please don't change it
Uri Blass
Posts: 10887
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: CEGT - rating lists February 12th 2012

Post by Uri Blass »

I do not like reducing the start elo of all programs.

I see no reason to try to adjust the rating of top programs to human's rating when humans do not compete in CEGT conditions(not from the opening position) so we have no data to know the rating of humans in these conditions.

It is going to be interesting to have tournaments between humans in CEGT conditions about the opening position and rating for humans in the same conditions(I will not be surprised if some humans are going to be 200 elo stronger or 200 elo weaker relative to normal chess) but unfortunately I guess that it is not going to happen.
Jouni
Posts: 3648
Joined: Wed Mar 08, 2006 8:15 pm
Full name: Jouni Uski

Re: CEGT - rating lists February 12th 2012

Post by Jouni »

In SSDF list:

19 Shredder 12 256MB A1200 MHz 2980

And here level is originally based on real games vs humans! Level is adjusted a couple of times down already.

Jouni
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT - rating lists February 12th 2012

Post by IWB »

Moin Gerhard,
ThatsIt wrote:
How could things be perfect ?
We're testing with ponder=off.

;-)

Best wishes,
G.S.
Yes, perfect is not ment in general of course. (and the POFF is not my main concern here :-) ) More in the sense of a better analysis of the results.

Bye
Ingo
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CEGT - rating lists February 12th 2012

Post by geots »

Werner wrote:Hi all, :D

our actual rating lists are online and can be found under the attached links. We have adjusted our lists. New reference engine is now Deep Shredder 12 x64 1CPU with 2800 points. The difference in startelo was (-)181 points here in our 40/20 list.

40 / 20:
New games: 1862 ; 52 different engines
Total: 573.179

NEW Engines

785 DanaSah 4.88: 2379 - 6000 games (-1 to version 4.66 here - and +26 at the moment in our blitz-tests)

UPDATES
2 Houdini 2.0c x64 4CPU: 3097 - 2642 games (+2)
88 Deep Junior 13 x64 4CPU: 2867 - 252 games (+10 and +5 to version 12.5)
191 Deep Junior 13 x64 1CPU: 2767 - 1048 games (-14 and +5 to version 12.5)

40 / 4:
New games: 8600
All games now: 978.710
New startelo here is 2588 (-204). New reference engine with 2800 points is Deep Shredder 12 x64 1CPU!

New Engines
202 Deep Junior 13 x64 4CPU : 2744 - 1200 games (+9 to version 12.5)
263 Deep Junior 13 w32 1CPU : 2693 - 800 games (+-0 to version 12.5)
542 Cheng 3 v1.07 x64: 2496 - 1000 games (+23 to v. 1.06)
766 DanaSah 4.88: 2388 - 1000 games (+26 to v. 4.66)
807 GreKo 9.0 x64: 2362 - 1000 games (+5 to v. 8.2 here)
1036 EveAnn 1.67: 2154 - 800 games (+45 to v. 1.66)
1074 Waxmann 2011: 2086 - 800 games (-13 to v. 2010)

Updates
3 Critter 1.4 x64 4CPU : 3063 - 2100 games (+-0)
746 Arasan 13.4 w32 1CPU : 2397 - 1100 games (+15)
750 Tornado 4.25 w32 1CPU: 2395 - 1200 games (+1)
857 Murka 2.0 x64 : 2330 - 1400 games (-2)
1029 ECE 12.01: 2165 - 900 games (+11)

40/120
See here our new single-list:
http://www.husvankempen.de/nunn//40120n ... liste.html

A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
Elo-comparison: http://www.husvankempen.de/nunn/Replay/ ... arison.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

Werner Schuele
CEGT-Team


I have been studying your 40/4 lists to help me in my 40/3 engine match choices. Very comprehensive testing and results and rating for just about any engine version in existence. Already the listing and games for Junior 13. Your list has been invaluable to me. I am very impressed with the work.


Best,

george
User avatar
geots
Posts: 4790
Joined: Sat Mar 11, 2006 12:42 am

Re: CEGT - rating lists February 12th 2012

Post by geots »

Werner wrote:Hi all, :D

our actual rating lists are online and can be found under the attached links. We have adjusted our lists. New reference engine is now Deep Shredder 12 x64 1CPU with 2800 points. The difference in startelo was (-)181 points here in our 40/20 list.

40 / 20:
New games: 1862 ; 52 different engines
Total: 573.179

NEW Engines

785 DanaSah 4.88: 2379 - 6000 games (-1 to version 4.66 here - and +26 at the moment in our blitz-tests)

UPDATES
2 Houdini 2.0c x64 4CPU: 3097 - 2642 games (+2)
88 Deep Junior 13 x64 4CPU: 2867 - 252 games (+10 and +5 to version 12.5)
191 Deep Junior 13 x64 1CPU: 2767 - 1048 games (-14 and +5 to version 12.5)

40 / 4:
New games: 8600
All games now: 978.710
New startelo here is 2588 (-204). New reference engine with 2800 points is Deep Shredder 12 x64 1CPU!

New Engines
202 Deep Junior 13 x64 4CPU : 2744 - 1200 games (+9 to version 12.5)
263 Deep Junior 13 w32 1CPU : 2693 - 800 games (+-0 to version 12.5)
542 Cheng 3 v1.07 x64: 2496 - 1000 games (+23 to v. 1.06)
766 DanaSah 4.88: 2388 - 1000 games (+26 to v. 4.66)
807 GreKo 9.0 x64: 2362 - 1000 games (+5 to v. 8.2 here)
1036 EveAnn 1.67: 2154 - 800 games (+45 to v. 1.66)
1074 Waxmann 2011: 2086 - 800 games (-13 to v. 2010)

Updates
3 Critter 1.4 x64 4CPU : 3063 - 2100 games (+-0)
746 Arasan 13.4 w32 1CPU : 2397 - 1100 games (+15)
750 Tornado 4.25 w32 1CPU: 2395 - 1200 games (+1)
857 Murka 2.0 x64 : 2330 - 1400 games (-2)
1029 ECE 12.01: 2165 - 900 games (+11)

40/120
See here our new single-list:
http://www.husvankempen.de/nunn//40120n ... liste.html

A big „Thank you“ to all testers as usual!!

Links

40/20: http://www.husvankempen.de/nunn/rating.htm
Blitz: http://www.husvankempen.de/nunn/blitz.htm
40/120: http://www.husvankempen.de/nunn/rating120.htm
Tester: http://www.husvankempen.de/nunn/testers/testers.htm
Elo-comparison: http://www.husvankempen.de/nunn/Replay/ ... arison.htm
Games of the week: http://www.husvankempen.de/nunn/40_40%2 ... on/gow.jpg

Werner Schuele
CEGT-Team



To add one point- I had run 2 different 40/3 repeating matches and I was shocked a bit at the results. I am sitting there wondering what I did wrong- and I happen to notice the threads with your blitz list. I checked the ratings of the engines in question, and they were right in the center of the margin for error in your list. My results reflected perfectly your ratings. So I was the problem- not the engines. I know all results won't fall in that perfectly- but this time it certainly answered every question I had. Thank you.


george