IPON results for Houdini 4

Discussion of computer chess matches and engine tournaments.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
ouachita
Posts: 454
Joined: Tue Jan 15, 2013 3:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: IPON results for Houdini 4

Post by ouachita » Tue Dec 03, 2013 1:16 am

Finally, a rational post, and I agree.
SIM, PhD, MBA, PE

IGarcia
Posts: 543
Joined: Mon Jul 05, 2010 8:27 pm

Re: IPON results for Houdini 4

Post by IGarcia » Tue Dec 03, 2013 1:44 am

Daniel Shawul wrote:
The tests at TCEC or currently underway at this site do not support your rants.
Stuck on stupid? Houdini latest version (9601) got disqualified from TCEC in a rather pathetic manner, 3 losses out of 6 to stockfish, and in 30 moves or so like a patzer :) Once you comprehend that, look at the the threads below you for Houdini 3 results, that are even more hilarious.
Daniel, Look at nTCEC final, games 25-26, 27-28 and 29-30
There you see 3 games won by Komodo and 3 draws. One "set" was won by komodo same opening from black, then from white side.

Does this poof Stockfish DD its a patzer?

Those 6 games between Stockfish and Houidni are more the exception than the rule. Also remember that Stockfish was close to be outside competition on stage 3.

If you don't understand this simple thing about statistics and the necessity on a big sample, then only stuck on stupid is you.

Ryan Benitez
Posts: 719
Joined: Thu Mar 09, 2006 12:21 am
Location: Portland Oregon

Re: IPON results for Houdini 4

Post by Ryan Benitez » Tue Dec 03, 2013 1:44 am

Daniel Shawul wrote:
not statistically representative.
That is not what is said when Houdini won it last time! Besides there are many other tests that show Stockfish, a free engine, murders a commercial Houdini.
Why would one throw away money like that when there is a superior free engine?
Maybe some people want to collect every version of Ippolit no matter the price?

Ryan Benitez
Posts: 719
Joined: Thu Mar 09, 2006 12:21 am
Location: Portland Oregon

Re: IPON results for Houdini 4

Post by Ryan Benitez » Tue Dec 03, 2013 1:49 am

IGarcia wrote:
Daniel Shawul wrote:
The tests at TCEC or currently underway at this site do not support your rants.
Stuck on stupid? Houdini latest version (9601) got disqualified from TCEC in a rather pathetic manner, 3 losses out of 6 to stockfish, and in 30 moves or so like a patzer :) Once you comprehend that, look at the the threads below you for Houdini 3 results, that are even more hilarious.
Daniel, Look at nTCEC final, games 25-26, 27-28 and 29-30
There you see 3 games won by Komodo and 3 draws. One "set" was won by komodo same opening from black, then from white side.

Does this poof Stockfish DD its a patzer?

Those 6 games between Stockfish and Houidni are more the exception than the rule. Also remember that Stockfish was close to be outside competition on stage 3.

If you don't understand this simple thing about statistics and the necessity on a big sample, then only stuck on stupid is you.
I think you completely missed his point. People are feeding the Houdini hype machine and it is absurd considering that people are actually paying for a tuned version of Robbolito.

ouachita
Posts: 454
Joined: Tue Jan 15, 2013 3:33 pm
Location: Ritz-Carlton, NYC
Full name: Bobby Johnson

Re: IPON results for Houdini 4

Post by ouachita » Tue Dec 03, 2013 2:03 am

people are actually paying for a tuned version of Robbolito.
Robert will have to confirm whether this statement is true or not. However, like me, most players simply want the best engine(s) regardless of its/their actual or alleged distant origins. Personally, I could care less as ICCF makes no engine origin distinction in their win/loss columns.
SIM, PhD, MBA, PE

Uri Blass
Posts: 8729
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: IPON results for Houdini 4

Post by Uri Blass » Tue Dec 03, 2013 4:02 am

Daniel Shawul wrote:It seems you either can't read or you do read but you can't comprehend it. Houdini 4 just got released so all we had so far is the closest version 9601 that was murdered in TCEC! There is no reason to expect otherwise since v4 is virtually the same version that played there. Changing version number is not change anything, since one can do daily updates of versions 4.1,4.2 to evade the inevitable, that is stockfish owns Houdini. As to previous version Houdini 3 vs stockfish results, all you have to do is look down threads below.
This is not correct

houdini4 is 44 elo above stockfish DD
based on the post from Mon Dec 02, 2013 6:36 pm
http://www.talkchess.com/forum/viewtopi ... 7&start=20

I do not claim that it is better than stockfish at longer time control but TCEC games do not have enough games to prove something about superiority of stockfish.

It seems that stockfishDD and houdini4 are close in strength at long time control and the fact that stockfish was stronger in tactics in some games in TCEC suggest that stockfish has also some weaknesses relative to houdini(otherwise they could not be close in strength).

I did not buy houdini4 but I find it unfair to claim that stockfish is better and I am sure that there are positions that houdini4 is better.

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: IPON results for Houdini 4

Post by IWB » Tue Dec 03, 2013 7:05 am

Houdini wrote:
Houdini wrote:Ingo Bauer was so kind to run the IPON rating list matches for the new Houdini 4.

The result is a +39 Elo increase compared to Houdini 3.
The new complete IPON rating list is given below.
People at the Rybka Forum asked for the individual match results, I might as well also publish them here.
Note that in the following list the Elo scale is calibrated differently, setting Houdini 3 at 3000 points.
...
Not just the scaling was different, the calculated result was just an "estimation" as well ...

This is the correct one:

Code: Select all

   1 Houdini 4                 3117 3000.0 (2488.5 : 511.5)
                                     150.0 ( 89.5 :  60.5) Komodo 6                   3044
                                     150.0 ( 88.5 :  61.5) Stockfish 4                3019
                                     150.0 (106.5 :  43.5) Critter 1.4a               2981
                                     150.0 (106.5 :  43.5) Gull 2.2                   2979
                                     150.0 (110.5 :  39.5) Deep Rybka 4.1             2956
                                     150.0 (121.0 :  29.0) Hannibal 1.4a              2866
                                     150.0 (125.5 :  24.5) Chiron 1.5                 2849
                                     150.0 (129.0 :  21.0) Protector 1.5.0            2840
                                     150.0 (131.5 :  18.5) Naum 4.2                   2829
                                     150.0 (126.0 :  24.0) HIARCS 14 WCSC 32b         2815
                                     150.0 (130.0 :  20.0) Jonny 6.00                 2801
                                     150.0 (135.5 :  14.5) Deep Shredder 12           2800
                                     150.0 (126.0 :  24.0) Deep Sjeng c't 2010 32b    2787
                                     150.0 (131.5 :  18.5) Spike 1.4 32b              2780
                                     150.0 (138.5 :  11.5) spark-1.0                  2759
                                     150.0 (137.0 :  13.0) Deep Junior 13.3           2743
                                     150.0 (138.5 :  11.5) Booot 5.2.0                2742
                                     150.0 (139.5 :  10.5) Quazar 0.4                 2735
                                     150.0 (138.5 :  11.5) Toga II 3.0 32b            2715
                                     150.0 (139.0 :  11.0) Zappa Mexico II            2713
But still I have no idea why there is interested in a one on one match with just 150 games?

Bye
Ingo

oreopoulos
Posts: 109
Joined: Fri Apr 25, 2008 8:56 pm

Re: IPON results for Houdini 4

Post by oreopoulos » Tue Dec 03, 2013 9:55 am

As a player that just uses engine for analysis and not like horses that compete my observations are

a) Stockfish is a much better attacker. Sometimes on the optimistic side of things, but i like that personally

b) Houdini is always on the safe side. Sometime you think (if you analyse with Houdini, that everything is a draw. I feel that it should explore attacking chances more.

c) Houdini's style suits better the fast playing. Its on the safe side. Hard to beat. On slower time contol, not being able to single out the good lines (since everything seems the same) hurts it a lot

d) Komodo 6 is a great positional evaluator. The best of all. It lacks tactics, but on positions where static features are important, it is just very accurate

e) Stockfish is the best endgame player. Especially in positions with repetitions where you see houdini suggesting in the pv back and forths... Without TB's its too optimistic sometimes.

Its a pitty we don't have Rybka's response to that...

User avatar
Houdini
Posts: 1471
Joined: Mon Mar 15, 2010 11:00 pm
Contact:

Re: IPON results for Houdini 4

Post by Houdini » Tue Dec 03, 2013 10:01 am

IWB wrote:But still I have no idea why there is interested in a one on one match with just 150 games?

Bye
Ingo
To be honest, I like to see these results as well - it gives a nice idea about the general consistency of the individual matches.

Looking at the over-all result of nearly 83% that Houdini 4 scores, it becomes clear that we're reaching a limit of the rating list. Next version should score over 85% against the same opponents, that's becoming too much...

Robert

IWB
Posts: 1539
Joined: Thu Mar 09, 2006 1:02 pm

Re: IPON results for Houdini 4

Post by IWB » Tue Dec 03, 2013 10:36 am

Houdini wrote:
Looking at the over-all result of nearly 83% that Houdini 4 scores, it becomes clear that we're reaching a limit of the rating list. Next version should score over 85% against the same opponents, that's becoming too much...
Yes, I see problems with my "20 best" setup emerging as well.
Looking at the last entries in my opponent list I am waiting for years that something better comes up. There are a few but for several reasons they are not reliable enough to run in my test setup.

Most likely I will increase the number of starting postions and decrease the number of opponents to keep the same number of games ...

Bye
Ingo

Post Reply