Page 4 of 9

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 11:39 am
by IWB
oreopoulos wrote:As a player that just uses engine for analysis and not like horses that compete my observations are...
As playing engines for years I agree that we (humans) have an impression about a playing style. Unfortunately I usually get disapointed sooner or later because trying to "understand" top engines "playing style" is in vain ...

Bye
Ingo

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 12:54 pm
by Laskos
Houdini wrote:
Houdini wrote:Ingo Bauer was so kind to run the IPON rating list matches for the new Houdini 4.

The result is a +39 Elo increase compared to Houdini 3.
The new complete IPON rating list is given below.
People at the Rybka Forum asked for the individual match results, I might as well also publish them here.
Note that in the following list the Elo scale is calibrated differently, setting Houdini 3 at 3000 points.

Code: Select all

Houdini 4 - Komodo 6 (2971)                    89.5  -  60.5    59.67%    Perf=3039
Houdini 4 - Stockfish 4 (2947)                 88.5  -  61.5    59.00%    Perf=3010
Houdini 4 - Gull 2.2 (2908)                   106.5  -  43.5    71.00%    Perf=3063
Houdini 4 - Critter 1.4a (2907)               106.5  -  43.5    71.00%    Perf=3062
Houdini 4 - Deep Rybka 4.1 (2882)             110.5  -  39.5    73.67%    Perf=3060
Houdini 4 - Hannibal 1.4a (2798)              121.0  -  29.0    80.67%    Perf=3046
Houdini 4 - Chiron 1.5 (2779)                 125.5  -  24.5    83.67%    Perf=3062
Houdini 4 - Protector 1.5.0 (2771)            129.0  -  21.0    86.00%    Perf=3086
Houdini 4 - Naum 4.2 (2768)                   131.5  -  18.5    87.67%    Perf=3108
Houdini 4 - HIARCS 14 WCSC 32b (2747)         126.0  -  24.0    84.00%    Perf=3035
Houdini 4 - Deep Shredder 12 (2730)           135.5  -  14.5    90.33%    Perf=3118
Houdini 4 - Jonny 6.00 (2729)                 130.0  -  20.0    86.67%    Perf=3054
Houdini 4 - Deep Sjeng c't 2010 32b (2713)    126.0  -  24.0    84.00%    Perf=3001
Houdini 4 - Spike 1.4 32b (2707)              131.5  -  18.5    87.67%    Perf=3047
Houdini 4 - spark-1.0 (2695)                  138.5  -  11.5    92.33%    Perf=3127
Houdini 4 - Deep Junior 13.3 (2677)           137.0  -  13.0    91.33%    Perf=3086
Houdini 4 - Booot 5.2.0 (2674)                138.5  -  11.5    92.33%    Perf=3106
Houdini 4 - Quazar 0.4 (2665)                 139.5  -  10.5    93.00%    Perf=3114
Houdini 4 - Zappa Mexico II (2652)            139.0  -  11.0    92.67%    Perf=3092
Houdini 4 - Toga II 3.0 32b (2643)            138.5  -  11.5    92.33%    Perf=3075
                                             2488.5  - 511.5    82.95%    Perf=3042
Thanks, it seems Houdini 4 is beating consistently everything at this TC. It is tested at Contempt=1, right?

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 1:34 pm
by IWB
Laskos wrote:
Thanks, it seems Houdini 4 is beating consistently everything at this TC. It is tested at Contempt=1, right?
This is all default except IPON specific settings.

Ingo

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 1:55 pm
by Wolfgang
lkaufman wrote:.... while at standard chess they are too close to call.
Considering our 5'+3" results I doubt that. Actually Houdini 4 is rated 3117 while performance of StockyDD is ~3060-3070. So around +40-50 points in f/o Houdini, which is definetely not "to close to call" especially on this high level...

I admit that with longer TC (40/20, 40/40 or even 40/120) the differences will decrease.

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 2:02 pm
by Martin Thoresen
IGarcia wrote: Martin nTCEC2 was a great competition, but without offense, is not statistically representative.
TCEC is a different beast than any of the rating lists. Lots of cores, lots of time to think for the engines.

Closer to the user behavior when they use an engine for analysis? Yes.
Statistically comparable to blitz rating lists? No.

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 2:06 pm
by Milos
lkaufman wrote:Based on the results posted here so far, the fairest thing to say about Houdini 4 and StockfishDD (or any November version) is that Houdini is clearly stronger in blitz, while at standard chess they are too close to call. A. Huerga's new LTC list, based on at least 600 games per engine, has Stockfish 5 elo points ahead of Houdini 4, well within the error margin. These results do show that Stockfish scales better than Houdini, so I would probably use SF over Houdini for analysis (i.e. for second opinion after Komodo!), but the jury is still out.
There is no standard chess. TC without hardware is meaningless.
5'+3'' on 6 cores OC fast machine for example is longer than CCRL 40/40 on a single core.
What is bullet today was LTC (FIDE) less than 8 years ago in terms of quality. Also what is LTC today in 10 years will be bullet.

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 3:56 pm
by Vinvin
IWB wrote:
oreopoulos wrote:As a player that just uses engine for analysis and not like horses that compete my observations are...
As playing engines for years I agree that we (humans) have an impression about a playing style. Unfortunately I usually get disapointed sooner or later because trying to "understand" top engines "playing style" is in vain ...

Bye
Ingo
Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 5:18 pm
by lkaufman
Wolfgang wrote:
lkaufman wrote:.... while at standard chess they are too close to call.
Considering our 5'+3" results I doubt that. Actually Houdini 4 is rated 3117 while performance of StockyDD is ~3060-3070. So around +40-50 points in f/o Houdini, which is definetely not "to close to call" especially on this high level...

I admit that with longer TC (40/20, 40/40 or even 40/120) the differences will decrease.
By standard chess I mean time limits that FIDE would rate, such as 90' + 30" or longer. It is very clear that Houdini's superiority over Komodo and Stockfish at blitz does not translate to superiority at these much longer time limits. If you believed that time limit doesn't matter (other than the spread of the ratings), why would CEGT and CCRL bother to test at short and long time controls?

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 7:30 pm
by IWB
Vinvin wrote: Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?
It WAS planned after H4!

Bye
Inog

Re: IPON results for Houdini 4

Posted: Tue Dec 03, 2013 8:22 pm
by IGarcia
Martin Thoresen wrote: TCEC is a different beast than any of the rating lists. Lots of cores, lots of time to think for the engines.

Closer to the user behavior when they use an engine for analysis? Yes.
Statistically comparable to blitz rating lists? No.
Sure, its more like you say and great fun and interesting.

The point is to be clear (without any been offended) is the tournament was not conclusive. There are some people out there telling SF DD and Komodo nTCEC2 are stronger than Houdini 4, using your tournament as back proof.


IWB wrote:
Vinvin wrote: Hello Ingo, is it planned to test Stockfish DD in 5m+3s ?
It WAS planned after H4!
It was planned and testing now, or... "WAS" means now is canceled?
Bye
Inog
Inog ? :lol: Funny I many times type my name Igancio, nasty bug, seems im not the only one.

Ignacio