IPON results for Houdini 4

IWB · Post by **IWB** » Tue Dec 03, 2013 11:39 am

oreopoulos wrote:As a player that just uses engine for analysis and not like horses that compete my observations are...

As playing engines for years I agree that we (humans) have an impression about a playing style. Unfortunately I usually get disapointed sooner or later because trying to "understand" top engines "playing style" is in vain ...

Bye
Ingo

Laskos · Post by **Laskos** » Tue Dec 03, 2013 12:54 pm

Houdini wrote:

Houdini wrote:Ingo Bauer was so kind to run the IPON rating list matches for the new Houdini 4.

The result is a +39 Elo increase compared to Houdini 3.
The new complete IPON rating list is given below.

People at the Rybka Forum asked for the individual match results, I might as well also publish them here.
Note that in the following list the Elo scale is calibrated differently, setting Houdini 3 at 3000 points.

Code: Select all

Houdini 4 - Komodo 6 (2971)                    89.5  -  60.5    59.67%    Perf=3039
Houdini 4 - Stockfish 4 (2947)                 88.5  -  61.5    59.00%    Perf=3010
Houdini 4 - Gull 2.2 (2908)                   106.5  -  43.5    71.00%    Perf=3063
Houdini 4 - Critter 1.4a (2907)               106.5  -  43.5    71.00%    Perf=3062
Houdini 4 - Deep Rybka 4.1 (2882)             110.5  -  39.5    73.67%    Perf=3060
Houdini 4 - Hannibal 1.4a (2798)              121.0  -  29.0    80.67%    Perf=3046
Houdini 4 - Chiron 1.5 (2779)                 125.5  -  24.5    83.67%    Perf=3062
Houdini 4 - Protector 1.5.0 (2771)            129.0  -  21.0    86.00%    Perf=3086
Houdini 4 - Naum 4.2 (2768)                   131.5  -  18.5    87.67%    Perf=3108
Houdini 4 - HIARCS 14 WCSC 32b (2747)         126.0  -  24.0    84.00%    Perf=3035
Houdini 4 - Deep Shredder 12 (2730)           135.5  -  14.5    90.33%    Perf=3118
Houdini 4 - Jonny 6.00 (2729)                 130.0  -  20.0    86.67%    Perf=3054
Houdini 4 - Deep Sjeng c't 2010 32b (2713)    126.0  -  24.0    84.00%    Perf=3001
Houdini 4 - Spike 1.4 32b (2707)              131.5  -  18.5    87.67%    Perf=3047
Houdini 4 - spark-1.0 (2695)                  138.5  -  11.5    92.33%    Perf=3127
Houdini 4 - Deep Junior 13.3 (2677)           137.0  -  13.0    91.33%    Perf=3086
Houdini 4 - Booot 5.2.0 (2674)                138.5  -  11.5    92.33%    Perf=3106
Houdini 4 - Quazar 0.4 (2665)                 139.5  -  10.5    93.00%    Perf=3114
Houdini 4 - Zappa Mexico II (2652)            139.0  -  11.0    92.67%    Perf=3092
Houdini 4 - Toga II 3.0 32b (2643)            138.5  -  11.5    92.33%    Perf=3075
                                             2488.5  - 511.5    82.95%    Perf=3042

Thanks, it seems Houdini 4 is beating consistently everything at this TC. It is tested at Contempt=1, right?

IWB · Post by **IWB** » Tue Dec 03, 2013 1:34 pm

Laskos wrote:
Thanks, it seems Houdini 4 is beating consistently everything at this TC. It is tested at Contempt=1, right?

This is all default except IPON specific settings.

Ingo

Wolfgang · Post by **Wolfgang** » Tue Dec 03, 2013 1:55 pm

lkaufman wrote:.... while at standard chess they are too close to call.

Considering our 5'+3" results I doubt that. Actually Houdini 4 is rated 3117 while performance of StockyDD is ~3060-3070. So around +40-50 points in f/o Houdini, which is definetely not "to close to call" especially on this high level...

I admit that with longer TC (40/20, 40/40 or even 40/120) the differences will decrease.

Martin Thoresen · Post by **Martin Thoresen** » Tue Dec 03, 2013 2:02 pm

IGarcia wrote: Martin nTCEC2 was a great competition, but without offense, is not statistically representative.

TCEC is a different beast than any of the rating lists. Lots of cores, lots of time to think for the engines.

Closer to the user behavior when they use an engine for analysis? Yes.
Statistically comparable to blitz rating lists? No.

Milos · Post by **Milos** » Tue Dec 03, 2013 2:06 pm

lkaufman wrote:Based on the results posted here so far, the fairest thing to say about Houdini 4 and StockfishDD (or any November version) is that Houdini is clearly stronger in blitz, while at standard chess they are too close to call. A. Huerga's new LTC list, based on at least 600 games per engine, has Stockfish 5 elo points ahead of Houdini 4, well within the error margin. These results do show that Stockfish scales better than Houdini, so I would probably use SF over Houdini for analysis (i.e. for second opinion after Komodo!), but the jury is still out.

There is no standard chess. TC without hardware is meaningless.
5'+3'' on 6 cores OC fast machine for example is longer than CCRL 40/40 on a single core.
What is bullet today was LTC (FIDE) less than 8 years ago in terms of quality. Also what is LTC today in 10 years will be bullet.

Vinvin · Post by **Vinvin** » Tue Dec 03, 2013 3:56 pm

IWB wrote:
oreopoulos wrote:As a player that just uses engine for analysis and not like horses that compete my observations are...
As playing engines for years I agree that we (humans) have an impression about a playing style. Unfortunately I usually get disapointed sooner or later because trying to "understand" top engines "playing style" is in vain ...

Bye
Ingo

Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?

lkaufman · Post by **lkaufman** » Tue Dec 03, 2013 5:18 pm

Wolfgang wrote:
lkaufman wrote:.... while at standard chess they are too close to call.
Considering our 5'+3" results I doubt that. Actually Houdini 4 is rated 3117 while performance of StockyDD is ~3060-3070. So around +40-50 points in f/o Houdini, which is definetely not "to close to call" especially on this high level...

I admit that with longer TC (40/20, 40/40 or even 40/120) the differences will decrease.

By standard chess I mean time limits that FIDE would rate, such as 90' + 30" or longer. It is very clear that Houdini's superiority over Komodo and Stockfish at blitz does not translate to superiority at these much longer time limits. If you believed that time limit doesn't matter (other than the spread of the ratings), why would CEGT and CCRL bother to test at short and long time controls?

IWB · Post by **IWB** » Tue Dec 03, 2013 7:30 pm

Vinvin wrote: Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?

It WAS planned after H4!

Bye
Inog

IGarcia · Post by **IGarcia** » Tue Dec 03, 2013 8:22 pm

Martin Thoresen wrote: TCEC is a different beast than any of the rating lists. Lots of cores, lots of time to think for the engines.

Closer to the user behavior when they use an engine for analysis? Yes.
Statistically comparable to blitz rating lists? No.

Sure, its more like you say and great fun and interesting.

The point is to be clear (without any been offended) is the tournament was not conclusive. There are some people out there telling SF DD and Komodo nTCEC2 are stronger than Houdini 4, using your tournament as back proof.

IWB wrote:
Vinvin wrote: Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?
It WAS planned after H4!

It was planned and testing now, or... "WAS" means now is canceled?

Bye
Inog

Inog ?

Funny I many times type my name Igancio, nasty bug, seems im not the only one.

Ignacio

IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4

Re: IPON results for Houdini 4