CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 included

Discussion of computer chess matches and engine tournaments.

Moderator: Ras

ThatsIt
Posts: 992
Joined: Thu Mar 09, 2006 2:11 pm

CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 included

Post by ThatsIt »

Hi to all !

The testrun with Komodo 7.0a x64 is completed.

The "all versions list" so far (created by ELO-Stat 1.3):

Code: Select all

   Program                         Elo    +   -  Games    Score   Av.Op.  Draws
01 Houdini 4.0 x64                3108   15  15   1700    78.3 %   2885   29.9 %
02 Komodo 7.0a x64 (NEW)          3070   14  14   1600    75.2 %   2877   35.1 %
03 Stockfish DD x64               3065   14  13   1700    73.6 %   2887   37.3 %
04 Houdini 3.0 x64                3062   15  15   1700    73.5 %   2885   30.6 %
05 Komodo TCEC x64                3046   14  14   1600    72.6 %   2877   35.2 %
06 Komodo 6.0 x64                 3043   14  14   1600    72.2 %   2877   37.1 %
07 Gull 2.8 beta x64              3016   13  13   1700    67.4 %   2890   39.3 %
08 Komodo 5.1r2 x64               3016   14  14   1600    68.9 %   2877   37.1 %
09 Stockfish 4.0 x64              3015   13  13   1700    67.6 %   2887   41.3 %
10 Equinox 3.00 x64               2988   12  12   1700    63.7 %   2891   44.2 %
11 Critter 1.6 x64                2984   12  12   1750    62.7 %   2893   43.1 %
12 Equinox 2.01 x64               2975   13  13   1700    61.9 %   2891   42.4 %
13 Gull 2.2 x64                   2975   13  13   1700    62.0 %   2890   42.1 %
14 Rybka 4.1 x64                  2947   12  12   1750    57.5 %   2894   42.5 %
15 BlackMamba 1.4 x64             2916   12  12   1750    53.0 %   2895   42.6 %
16 Deep Fritz 14 x64              2902   12  12   1750    50.9 %   2896   45.3 %
17 Chiron 2.0 x64                 2894   12  12   1700    49.5 %   2897   42.9 %
18 Protector 1.6.0 x64            2875   12  12   1700    46.7 %   2898   44.8 %
19 Hannibal 1.4a x64              2854   13  13   1750    43.8 %   2897   40.3 %
20 Chiron 1.5 x64                 2850   13  13   1700    43.2 %   2897   40.1 %
21 Senpai 1.0 x64                 2838   13  13   1750    41.5 %   2897   39.6 %
22 Loop 2010-x x64                2838   12  12   1750    41.5 %   2897   43.4 %
23 Protector 1.5.0 x64            2837   13  13   1700    41.3 %   2898   37.9 %
24 Hiarcs 14                      2830   13  13   1750    40.4 %   2898   39.1 %
25 Naum 4.2 x64                   2820   13  13   1750    38.9 %   2898   37.2 %
26 Fritz 13                       2815   13  13   1750    38.2 %   2898   40.1 %
27 Deep Shredder 12 x64           2800   13  13   1750    36.1 %   2898   38.2 %
28 Deep Sjeng ct 2010 w32         2797   13  13   1750    35.7 %   2899   37.9 %
29 Texel 1.03 x64                 2791   13  13   1750    35.0 %   2899   37.7 %
30 Jonny 6.00 x64                 2788   13  13   1750    34.6 %   2899   34.7 %
31 Spike 1.4                      2774   13  13   1750    32.7 %   2899   35.5 %
32 Deep Junior 13.3 x64           2762   14  14   1750    31.2 %   2900   31.9 %
33 Spark 1.0 x64                  2760   13  14   1750    30.9 %   2900   35.3 %
34 DiscoCheck 5.2 x64             2755   14  14   1750    30.3 %   2900   34.3 %
35 Booot 5.2.0 x64                2750   14  14   1750    29.7 %   2900   34.2 %
36 Quazar 0.4 x64                 2742   14  14   1750    28.7 %   2900   33.0 %


CEGT 5'+3" list (all versions) ---> http://www.husvankempen.de/nunn/5Plus3R ... liste.html
CEGT 5'+3" list (pure list) ---> http://www.husvankempen.de/nunn/5Plus3R ... liste.html

Current test (Stockfish 5.0 x64 1CPU) ---> http://cegt.forumieren.com/t153-testing ... ish-50-x64

Some more stats ---> http://www.husvankempen.de/nunn/5Plus3R ... /stats.htm

Best wishes,
G.S.
(CEGT team)
Modern Times
Posts: 3803
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 incl

Post by Modern Times »

Thanks for the update !

So:

CEGT 5+3 has Houdini 4 as +44 Elo above Komodo 7 on the pure list
(I think these were all Intel games)

IPON 5+3 has Houdini 4 as +23 Elo above Komodo 7
(AMD games)

It doesn't seem to me that AMD disadvantages Komodo. On the AMD list (IPON) it is closer to Houdini than the Intel list (CEGT). Of course there are some other differences apart from the hardware.

What I only just noticed is that IPON is not totally the same hardware now, but still all AMD. The web page says

CPU : 3.2 GHz Phenom2 + 4GHz FX-8350
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 incl

Post by IWB »

Modern Times wrote:
CEGT 5+3 has Houdini 4 as +44 Elo above Komodo 7 on the pure list
(I think these were all Intel games)

IPON 5+3 has Houdini 4 as +23 Elo above Komodo 7
(AMD games)
Looking at the error bars it is hard to see any difference at all ...
Modern Times wrote:
What I only just noticed is that IPON is not totally the same hardware now, but still all AMD. The web page says

CPU : 3.2 GHz Phenom2 + 4GHz FX-8350
Yes, one 8350. Theoreticaly 18% of the games are played on the 8350.


In the past I run a few performance test of different engines about the AMD HW. And yes, engines behave different, and yes Komodo was a bit on the bad side over all engines (I did not check K7 in particular), but the difference was ususally in the low one digit percent range. So, to prove a difference because of HW you have to play a LOT of games under identical conditions.
I personaly would blame differences to openings, different opponents or luck - just because proving a HW difference is close to impossible.

There is/was one exception. All Fritzes are unusually bad on my AMDs. Why there is a 20%+ performance gap is unknown to me. (And good old Junior, developed on Intel, was always overperforming a bit :-) )

Bye
Ingo
Modern Times
Posts: 3803
Joined: Thu Jun 07, 2012 11:02 pm

Re: CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 incl

Post by Modern Times »

IWB wrote:I personaly would blame differences to openings, different opponents or luck - just because proving a HW difference is close to impossible.
Indeed, that is my opinion also.
IWB wrote: There is/was one exception. All Fritzes are unusually bad on my AMDs. Why there is a 20%+ performance gap is unknown to me. (And good old Junior, developed on Intel, was always overperforming a bit :-) )
I've heard this but not tested it myself.

Anyway, thanks for continuing work on your list !
lkaufman
Posts: 6284
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 incl

Post by lkaufman »

Modern Times wrote:Thanks for the update !

So:

CEGT 5+3 has Houdini 4 as +44 Elo above Komodo 7 on the pure list
(I think these were all Intel games)

IPON 5+3 has Houdini 4 as +23 Elo above Komodo 7
(AMD games)

It doesn't seem to me that AMD disadvantages Komodo. On the AMD list (IPON) it is closer to Houdini than the Intel list (CEGT). Of course there are some other differences apart from the hardware.

What I only just noticed is that IPON is not totally the same hardware now, but still all AMD. The web page says

CPU : 3.2 GHz Phenom2 + 4GHz FX-8350
Yes, it does appear that AMD didn't play much of a role here; perhaps the 10% or so differences I measured were with different or older machines than those now in use. The reason for the lower rating of Komodo (and also Stockfish) relative to Houdini 4 on the CEGT compared to the IPON list is probably that IPON has a higher cutoff level for inclusion. Houdini gets far less draws due to high contempt factor, which is very helpful on CEGT and CCRL lists as they run mostly matches with large rating differences.
Leaving Komodo out of it since I'm not unbiased about it, let's compare Stockfish 5 with Houdini 4. On the IPON list, it is five elo weaker. The CEGT run has just begun, but if it shows the same forty elo gain as it did on IPON, it will also be a few elo behind Houdini 4. Yet, as far as I know Stockfish 5 has beaten Houdini 4 in every match at every time control reported here, usually rather decisively. Stockfish 5 also scores much higher against Komodo than does Houdini 4. I think it is fair to say that Stockfish 5 is clearly stronger than Houdini 4 at any time control, regardless of books, hardware, number of cores, etc. Yet somehow it ends up behind or about equal with Houdini 4 on these two 5'+3" rating lists. Obviously it is because Houdini 4 makes far fewer draws against the "weakies". A contributing factor is the use of BayesElo rather than Ordo, because Bayes gives more weight to the mismatched games I believe.
Bottom line: ratings are more accurate when large mismatches are avoided.
User avatar
Dr.Wael Deeb
Posts: 9773
Joined: Wed Mar 08, 2006 8:44 pm
Location: Amman,Jordan

Re: CEGT 5'+3" pb=on ratinglist // Komodo 7.0a x64 incl

Post by Dr.Wael Deeb »

lkaufman wrote:
Modern Times wrote:Thanks for the update !

So:

CEGT 5+3 has Houdini 4 as +44 Elo above Komodo 7 on the pure list
(I think these were all Intel games)

IPON 5+3 has Houdini 4 as +23 Elo above Komodo 7
(AMD games)

It doesn't seem to me that AMD disadvantages Komodo. On the AMD list (IPON) it is closer to Houdini than the Intel list (CEGT). Of course there are some other differences apart from the hardware.

What I only just noticed is that IPON is not totally the same hardware now, but still all AMD. The web page says

CPU : 3.2 GHz Phenom2 + 4GHz FX-8350
Yes, it does appear that AMD didn't play much of a role here; perhaps the 10% or so differences I measured were with different or older machines than those now in use. The reason for the lower rating of Komodo (and also Stockfish) relative to Houdini 4 on the CEGT compared to the IPON list is probably that IPON has a higher cutoff level for inclusion. Houdini gets far less draws due to high contempt factor, which is very helpful on CEGT and CCRL lists as they run mostly matches with large rating differences.
Leaving Komodo out of it since I'm not unbiased about it, let's compare Stockfish 5 with Houdini 4. On the IPON list, it is five elo weaker. The CEGT run has just begun, but if it shows the same forty elo gain as it did on IPON, it will also be a few elo behind Houdini 4. Yet, as far as I know Stockfish 5 has beaten Houdini 4 in every match at every time control reported here, usually rather decisively. Stockfish 5 also scores much higher against Komodo than does Houdini 4. I think it is fair to say that Stockfish 5 is clearly stronger than Houdini 4 at any time control, regardless of books, hardware, number of cores, etc. Yet somehow it ends up behind or about equal with Houdini 4 on these two 5'+3" rating lists. Obviously it is because Houdini 4 makes far fewer draws against the "weakies". A contributing factor is the use of BayesElo rather than Ordo, because Bayes gives more weight to the mismatched games I believe.
Bottom line: ratings are more accurate when large mismatches are avoided.
Rating lists are more accurate when you calculate manualy the Elo ratings....

I for one use Chess Calculator in my private rating list....

Google it and see what I am taking about regards,
Dr.D
_No one can hit as hard as life.But it ain’t about how hard you can hit.It’s about how hard you can get hit and keep moving forward.How much you can take and keep moving forward….