IPON results for Houdini 4

Discussion of computer chess matches and engine tournaments.

Moderators: hgm, Rebel, chrisw

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: IPON results for Houdini 4

Post by IWB »

oreopoulos wrote:As a player that just uses engine for analysis and not like horses that compete my observations are...
As playing engines for years I agree that we (humans) have an impression about a playing style. Unfortunately I usually get disapointed sooner or later because trying to "understand" top engines "playing style" is in vain ...

Bye
Ingo
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: IPON results for Houdini 4

Post by Laskos »

Houdini wrote:
Houdini wrote:Ingo Bauer was so kind to run the IPON rating list matches for the new Houdini 4.

The result is a +39 Elo increase compared to Houdini 3.
The new complete IPON rating list is given below.
People at the Rybka Forum asked for the individual match results, I might as well also publish them here.
Note that in the following list the Elo scale is calibrated differently, setting Houdini 3 at 3000 points.

Code: Select all

Houdini 4 - Komodo 6 (2971)                    89.5  -  60.5    59.67%    Perf=3039
Houdini 4 - Stockfish 4 (2947)                 88.5  -  61.5    59.00%    Perf=3010
Houdini 4 - Gull 2.2 (2908)                   106.5  -  43.5    71.00%    Perf=3063
Houdini 4 - Critter 1.4a (2907)               106.5  -  43.5    71.00%    Perf=3062
Houdini 4 - Deep Rybka 4.1 (2882)             110.5  -  39.5    73.67%    Perf=3060
Houdini 4 - Hannibal 1.4a (2798)              121.0  -  29.0    80.67%    Perf=3046
Houdini 4 - Chiron 1.5 (2779)                 125.5  -  24.5    83.67%    Perf=3062
Houdini 4 - Protector 1.5.0 (2771)            129.0  -  21.0    86.00%    Perf=3086
Houdini 4 - Naum 4.2 (2768)                   131.5  -  18.5    87.67%    Perf=3108
Houdini 4 - HIARCS 14 WCSC 32b (2747)         126.0  -  24.0    84.00%    Perf=3035
Houdini 4 - Deep Shredder 12 (2730)           135.5  -  14.5    90.33%    Perf=3118
Houdini 4 - Jonny 6.00 (2729)                 130.0  -  20.0    86.67%    Perf=3054
Houdini 4 - Deep Sjeng c't 2010 32b (2713)    126.0  -  24.0    84.00%    Perf=3001
Houdini 4 - Spike 1.4 32b (2707)              131.5  -  18.5    87.67%    Perf=3047
Houdini 4 - spark-1.0 (2695)                  138.5  -  11.5    92.33%    Perf=3127
Houdini 4 - Deep Junior 13.3 (2677)           137.0  -  13.0    91.33%    Perf=3086
Houdini 4 - Booot 5.2.0 (2674)                138.5  -  11.5    92.33%    Perf=3106
Houdini 4 - Quazar 0.4 (2665)                 139.5  -  10.5    93.00%    Perf=3114
Houdini 4 - Zappa Mexico II (2652)            139.0  -  11.0    92.67%    Perf=3092
Houdini 4 - Toga II 3.0 32b (2643)            138.5  -  11.5    92.33%    Perf=3075
                                             2488.5  - 511.5    82.95%    Perf=3042
Thanks, it seems Houdini 4 is beating consistently everything at this TC. It is tested at Contempt=1, right?
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: IPON results for Houdini 4

Post by IWB »

Laskos wrote:
Thanks, it seems Houdini 4 is beating consistently everything at this TC. It is tested at Contempt=1, right?
This is all default except IPON specific settings.

Ingo
Wolfgang
Posts: 895
Joined: Sat May 13, 2006 1:08 am

Re: IPON results for Houdini 4

Post by Wolfgang »

lkaufman wrote:.... while at standard chess they are too close to call.
Considering our 5'+3" results I doubt that. Actually Houdini 4 is rated 3117 while performance of StockyDD is ~3060-3070. So around +40-50 points in f/o Houdini, which is definetely not "to close to call" especially on this high level...

I admit that with longer TC (40/20, 40/40 or even 40/120) the differences will decrease.
Best
Wolfgang
CEGT-Team
www.cegt.net
www.cegt.forumieren.com
Martin Thoresen
Posts: 1833
Joined: Thu Jun 22, 2006 12:07 am

Re: IPON results for Houdini 4

Post by Martin Thoresen »

IGarcia wrote: Martin nTCEC2 was a great competition, but without offense, is not statistically representative.
TCEC is a different beast than any of the rating lists. Lots of cores, lots of time to think for the engines.

Closer to the user behavior when they use an engine for analysis? Yes.
Statistically comparable to blitz rating lists? No.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: IPON results for Houdini 4

Post by Milos »

lkaufman wrote:Based on the results posted here so far, the fairest thing to say about Houdini 4 and StockfishDD (or any November version) is that Houdini is clearly stronger in blitz, while at standard chess they are too close to call. A. Huerga's new LTC list, based on at least 600 games per engine, has Stockfish 5 elo points ahead of Houdini 4, well within the error margin. These results do show that Stockfish scales better than Houdini, so I would probably use SF over Houdini for analysis (i.e. for second opinion after Komodo!), but the jury is still out.
There is no standard chess. TC without hardware is meaningless.
5'+3'' on 6 cores OC fast machine for example is longer than CCRL 40/40 on a single core.
What is bullet today was LTC (FIDE) less than 8 years ago in terms of quality. Also what is LTC today in 10 years will be bullet.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: IPON results for Houdini 4

Post by Vinvin »

IWB wrote:
oreopoulos wrote:As a player that just uses engine for analysis and not like horses that compete my observations are...
As playing engines for years I agree that we (humans) have an impression about a playing style. Unfortunately I usually get disapointed sooner or later because trying to "understand" top engines "playing style" is in vain ...

Bye
Ingo
Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?
lkaufman
Posts: 5960
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA

Re: IPON results for Houdini 4

Post by lkaufman »

Wolfgang wrote:
lkaufman wrote:.... while at standard chess they are too close to call.
Considering our 5'+3" results I doubt that. Actually Houdini 4 is rated 3117 while performance of StockyDD is ~3060-3070. So around +40-50 points in f/o Houdini, which is definetely not "to close to call" especially on this high level...

I admit that with longer TC (40/20, 40/40 or even 40/120) the differences will decrease.
By standard chess I mean time limits that FIDE would rate, such as 90' + 30" or longer. It is very clear that Houdini's superiority over Komodo and Stockfish at blitz does not translate to superiority at these much longer time limits. If you believed that time limit doesn't matter (other than the spread of the ratings), why would CEGT and CCRL bother to test at short and long time controls?
IWB
Posts: 1539
Joined: Thu Mar 09, 2006 2:02 pm

Re: IPON results for Houdini 4

Post by IWB »

Vinvin wrote: Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?
It WAS planned after H4!

Bye
Inog
IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 10:27 pm

Re: IPON results for Houdini 4

Post by IGarcia »

Martin Thoresen wrote: TCEC is a different beast than any of the rating lists. Lots of cores, lots of time to think for the engines.

Closer to the user behavior when they use an engine for analysis? Yes.
Statistically comparable to blitz rating lists? No.
Sure, its more like you say and great fun and interesting.

The point is to be clear (without any been offended) is the tournament was not conclusive. There are some people out there telling SF DD and Komodo nTCEC2 are stronger than Houdini 4, using your tournament as back proof.


IWB wrote:
Vinvin wrote: Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?
It WAS planned after H4!
It was planned and testing now, or... "WAS" means now is canceled?
Bye
Inog
Inog ? :lol: Funny I many times type my name Igancio, nasty bug, seems im not the only one.

Ignacio