Komodo 4 running for the IPON

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo 4 running for the IPON

Post by Laskos »

ernest wrote:
Laskos wrote:ps You are correct that non-linearity of the Elo curve is the culprit of everything,
Remember Rémi's answer (Posted: Sun Sep 04, 2011): :)

http://www.talkchess.com/forum/viewtopi ... 320#422320
I remember it, I answered in the same thread, and while correct, his answer is basically just saying that taking the average is wrong. Which was not a secret, I think.

Kai
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: Komodo 4 running for the IPON

Post by IGarcia »

if you consider only the games against houdini 2, the rating will be more than 3020. As you include more engines, the calculation turns more complex.

An engine (lets call Anti-Houdini) can win against houdini and perform very bad against all other engines.. so the final ELO will be low, because takes into account the proformance against the group of best engines, not only the one at top.

When houdini ipon test was done, managed to beat all, and really killed under 2800 engines. The komodo4 performance is more like the anti-houdini example.

(just to exaggerate things and show why is possible to beat all opponents >50% and not get to top)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Komodo 4 running for the IPON

Post by Sven »

Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.

Something is clearly wrong in the calculation of rating.

Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Ernest has provided a link to a statement of Rémi, have you read it, and also subsequent posts in that thread?

I think that match performance ratings between pairs of engines are almost irrelevant for any estimate of total ratings. Neither arithmetic mean nor median do say anything here. The overall score from all games is most probably a better indicator for the overall rating than any number derived from match performances.

Sven
Ron Langeveld
Posts: 140
Joined: Tue Jan 05, 2010 8:02 pm

Re: Komodo 4 running for the IPON

Post by Ron Langeveld »

Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.

Something is clearly wrong in the calculation of rating.

Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Why is it so difficult to understand that if Komodo scores 70% against the rest of the pack, not being Houdini, and that Houdini scores 80% against the same pack, that Houdini still has a higher rating even though it loses a head on match against Komodo ?

Sounds to me that some readers here are in need of a basic statistics course.
Frank Quisinsky
Posts: 7047
Joined: Wed Nov 18, 2009 7:16 pm
Location: Gutweiler, Germany
Full name: Frank Quisinsky

Re: Komodo 4 running for the IPON

Post by Frank Quisinsky »

+1
That's a good idea!
Seems that many of computer chess people need such a course. Often I am thinking I too :-)
Uri Blass
Posts: 10890
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Komodo 4 running for the IPON

Post by Uri Blass »

Ron Langeveld wrote:
Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.

Something is clearly wrong in the calculation of rating.

Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Why is it so difficult to understand that if Komodo scores 70% against the rest of the pack, not being Houdini, and that Houdini scores 80% against the same pack, that Houdini still has a higher rating even though it loses a head on match against Komodo ?

Sounds to me that some readers here are in need of a basic statistics course.
Of course
I understand it and my last question was what are the better results that houdini got.
Unfortunately I cannot see results of old matches.
Reply to my question could be
houdini-Crafty 90-10 Komodo-(same Crafty) 88-12 and the same for other programs
I wanted to see houdini and komodo results side by side and the same for Critter now.
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: Komodo 4 running for the IPON

Post by IGarcia »

some changed, but several engines are the same... here you have

Code: Select all

1 Houdini 2.0 STD          3019 2800.0 (2227.5 : 572.5)
                                   100.0 ( 48.5 :  51.5) Komodo 4 SSE42           2979
                                   100.0 ( 57.0 :  43.0) Komodo64 3 SSE42         2965
                                   100.0 ( 57.5 :  42.5) Deep Rybka 4.1 SSE42     2955
                                   100.0 ( 57.5 :  42.5) Deep Rybka 4             2954
                                   100.0 ( 54.5 :  45.5) Critter 1.2              2952
                                   100.0 ( 63.5 :  36.5) Stockfish 2.1.1 JA       2941
                                   100.0 ( 77.5 :  22.5) Chiron 1.1a              2834
                                   100.0 ( 75.5 :  24.5) Naum 4.2                 2826
                                   100.0 ( 85.5 :  14.5) Fritz 13 32b             2819
                                   100.0 ( 74.5 :  25.5) Deep Shredder 12         2800
                                   100.0 ( 83.5 :  16.5) Gull 1.2                 2795
                                   100.0 ( 79.5 :  20.5) Deep Sjeng c't 2010 32b  2787
                                   100.0 ( 79.0 :  21.0) Spike 1.4 32b            2783
                                   100.0 ( 83.0 :  17.0) Deep Fritz 12 32b        2779
                                   100.0 ( 86.0 :  14.0) Protector 1.4.0          2759
                                   100.0 ( 85.5 :  14.5) Hannibal 1.1             2757
                                   100.0 ( 86.5 :  13.5) spark-1.0 SSE42          2755
                                   100.0 ( 84.5 :  15.5) HIARCS 13.2 MP 32b       2748
                                   100.0 ( 81.5 :  18.5) Deep Junior 12.5         2731
                                   100.0 ( 89.0 :  11.0) Zappa Mexico II          2716
                                   100.0 ( 88.5 :  11.5) Deep Onno 1-2-70         2684
                                   100.0 ( 89.0 :  11.0) Toga II 1.4 beta5c BB    2672
                                   100.0 ( 93.0 :   7.0) Strelka 2.0 B            2671
                                   100.0 ( 94.5 :   5.5) Umko 1.2 SSE42           2663
                                   100.0 ( 90.0 :  10.0) Loop 2007                2620
                                   100.0 ( 96.5 :   3.5) Jonny 4.00 32b           2613
                                   100.0 ( 92.0 :   8.0) Tornado 4.80             2607
                                   100.0 ( 94.5 :   5.5) Crafty 23.3 JA           2598

Code: Select all

 Komodo 4 SSE42           2979 2300.0 (1777.5 : 522.5)
                                   100.0 ( 51.5 :  48.5) Houdini 2.0 STD          3019
                                   100.0 ( 51.5 :  48.5) Deep Rybka 4.1 SSE42     2955
                                   100.0 ( 53.5 :  46.5) Critter 1.2              2952
                                   100.0 ( 52.5 :  47.5) Stockfish 2.1.1 JA       2941
                                   100.0 ( 65.5 :  34.5) Chiron 1.1a              2834
                                   100.0 ( 69.5 :  30.5) Naum 4.2                 2826
                                   100.0 ( 68.0 :  32.0) Deep Shredder 12         2800
                                   100.0 ( 75.0 :  25.0) Gull 1.2                 2795
                                   100.0 ( 79.0 :  21.0) Deep Sjeng c't 2010 32b  2787
                                   100.0 ( 77.0 :  23.0) Spike 1.4 32b            2783
                                   100.0 ( 78.5 :  21.5) Protector 1.4.0          2759
                                   100.0 ( 80.0 :  20.0) Hannibal 1.1             2757
                                   100.0 ( 85.0 :  15.0) spark-1.0 SSE42          2755
                                   100.0 ( 87.5 :  12.5) HIARCS 13.2 MP 32b       2748
                                   100.0 ( 83.5 :  16.5) Deep Junior 12.5         2731
                                   100.0 ( 88.5 :  11.5) Zappa Mexico II          2716
                                   100.0 ( 90.5 :   9.5) Deep Onno 1-2-70         2684
                                   100.0 ( 90.5 :   9.5) Strelka 2.0 B            2671
                                   100.0 ( 87.5 :  12.5) Umko 1.2 SSE42           2663
                                   100.0 ( 88.0 :  12.0) Loop 2007                2620
                                   100.0 ( 89.5 :  10.5) Jonny 4.00 32b           2613
                                   100.0 ( 93.0 :   7.0) Tornado 4.80             2607
                                   100.0 ( 92.5 :   7.5) Crafty 23.3 JA           2598
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Komodo 4 running for the IPON

Post by Laskos »

Ron Langeveld wrote:
Uri Blass wrote:Sven,I agree that arithmetic average is wrong but even the median seems to be always higher than the rating estimate and this is not to be expected.

Something is clearly wrong in the calculation of rating.

Note that
I see that komodo beat houdini and I see later that Critter also has 50% against Houdini and I wonder what are the better results of houdini that cause houdini2 to get better rating.
Why is it so difficult to understand that if Komodo scores 70% against the rest of the pack, not being Houdini, and that Houdini scores 80% against the same pack, that Houdini still has a higher rating even though it loses a head on match against Komodo ?

Sounds to me that some readers here are in need of a basic statistics course.
Yes, then do it, first by hand, with what you exemplified as clear, second, with EloStat.

Engines 1, 2, 3

Engine 1-2: 500:500
Engine 1-3: 970:30
Engine 2-3: 999:1

Compute the ratings and the error bars. Assume these are binomials, not trinomials, to be easier to compute.

LOL

Kai